DSpace About DSpace Software

DSpace at the University of Texas at Austin School of Information >
School of Information Institutional Repository >
Pre-Ingest Processing Environment and Toolkit (PIPET) >
PIPET Toolkit >
Web Harvesting tools >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2081/8984

Title: Heritrix web crawler
Authors: Internet Archive
Keywords: web crawler
Issue Date: 6-Aug-2007
Abstract: Heritrix is the web crawler developed by the Internet Archive (www.archive.org). It is Java based and requires Java 5.0 (so a JRE of 1.5.0 or better). Though technically platform independent, the Internet Archive only provides documentation for use on a Linux system. There are a series of libraries required to use the crawler, also included in the install.
Description: This version of Heritrix was downloaded from http://sourceforge.net/project/showfiles.php?group_id=73833&package_id=73980 on 5 August 2007. It has been released under a GNU Lesser General Public License. Both source and binary versions of the application are provided here, in both zip and tar gzip archives (the zip binary archive is recommended, and designated as the primary bit stream). Documentation is available at http://crawler.archive.org/ and in a separate DSpace item.
URI: http://hdl.handle.net/2081/8984
Appears in Collections:Web Harvesting tools

Files in This Item:

File Description SizeFormat
heritrix-1.12.1-src.tar.gzHeritrix source code9.43 MBtar gzip archiveView/Open
Heritrixlicense(LGPL).txtHeritrix license (GNU LGPL)26.89 kBTextView/Open
heritrix-1.12.1.zipHeritrix binary application archive20.27 MB.zip archiveView/Open
heritrix-1.12.1.tar.gzHeritrix binary application archive16.55 MBtar gzip archiveView/Open
heritrix-1.12.1-src.zipHeritrix source code10.03 MB.zip archiveView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback