Archive.org collection downloader

Archive.org is an amazing resource but it can be tedious to download collections of items from the archive. I wrote this utility when I wanted to download the 4am Apple II software collection and had tired of clicking each of the thousand plus links by hand. Furthermore this was a living collection that had new content added frequently so it was impossible to keep track of what I had downloaded and what I had not.

There was a good article on bulk downloading from archive.org that I used as a model for writing the below utility. The only real feature that my utility adds beyond bulk downloading of collections is that it will only download content that you do not yet have or has been updated since the last time you ran the utility. This helps both in the case that an error stops the download midway as well as for keeping up with changing collections.

Usage:

SyncCollection.exe [collectionName]

Content will be downloaded to a subdirectory named after the collection. Collection name can not contain spaces, though this should not be an issue as I don’t believe that a collection name can have spaces. Find the connection name as the last portion of the collection url.

For instance the collection name for this link is “oldtimeradio”:

https://archive.org/details/oldtimeradio

The utility is provided as is with no guarantees of fitness for any use. Enjoy.

Binaries (.Net 4.5.2 executables):
SyncCollection

Source (Visual Studio 2017 project):

SyncCollectionSource

Also on github: https://github.com/malfunct/SyncCollection