Latest Articles See all
Uploading large sets (hundreds, thousands, or even millions) of documents to DocumentCloud using the user interface can be laborious and requires careful monitoring of uploads for processing errors and splitting up the document set into smaller batches.
DocumentCloud’s Batch Upload Script was initially written to upload the CIA Crest files, which contains almost 1 million files. It keeps track of which files were uploaded successfully, so that it can be stopped and restarted and it will pick up where it left off, and errors can be retried. It uploads files in batches. It can be stopped gracefully by pressing CTRL+C (once) while it is running. A recent rewrite allows the script to run on any directory of documents.