DSpace at the University of Texas at Austin School of Information >
School of Information Institutional Repository >
Pre-Ingest Processing Environment and Toolkit (PIPET) >
PIPET Toolkit >
Web Harvesting tools >
Please use this identifier to cite or link to this item:
|Title: ||Heritrix web crawler|
|Authors: ||Internet Archive|
|Keywords: ||web crawler|
|Issue Date: ||6-Aug-2007|
|Abstract: ||Heritrix is the web crawler developed by the Internet Archive (www.archive.org). It is Java based and requires Java 5.0 (so a JRE of 1.5.0 or better). Though technically platform independent, the Internet Archive only provides documentation for use on a Linux system.
There are a series of libraries required to use the crawler, also included in the install.|
|Description: ||This version of Heritrix was downloaded from http://sourceforge.net/project/showfiles.php?group_id=73833&package_id=73980 on 5 August 2007. It has been released under a GNU Lesser General Public License. Both source and binary versions of the application are provided here, in both zip and tar gzip archives (the zip binary archive is recommended, and designated as the primary bit stream).
Documentation is available at http://crawler.archive.org/ and in a separate DSpace item.|
|Appears in Collections:||Web Harvesting tools|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.