Scraping metadata from Internet Archive.

Adapted from: https://internetarchive.readthedocs.io/en/latest/

This represents a basic use of the internetarchive python library, which is the backbone of all the IA scripts I use. This script does nothing except populate some variables. But, it’s a kernel for building lots of other things.

import internetarchive as ia
import json
from sys import argv

collection = argv[1]

search_collection = ia.search_items('collection:' + argv[1])
print str(search_collection.num_found) + " items in collection"
for result in search_collection:
    item_identifier = result['identifier']
    item = ia.get_item(item_identifier)
    print "Downloading " + item_identifier + " ..."
    title = item.item_metadata['metadata']['title']
    subjects = item.item_metadata['metadata']['subject']
# ... and so on...