DocumentCloud has had many feature updates within the last couple of months, including the ability to deep link to Add-Ons, translate documents using Translate Documents, split large documents using Document Splitter, add hash values as metadata to documents using Document Hasher, bulk change the visibility of notes on DocumentCloud using the Change Note Visibility Add-On, bug fixes for Transcribe Audio, and a few other general improvements.
General Improvements & Bug Fixes
Add-On Deep Linking
DocumentCloud Add-Ons now have deep linking enabled, meaning you can share the link to a useful Add-On to others with ease. You will notice when clicking on an Add-On it will now pull up the configuration menu and change the URL. For example, clicking on the PII Detector Add-On allows me to link to the Add-On directly like so:
Add-Ons can also be shared with parameters pre-filled by modifying the URL. For example, to share a URL to the PII Detector Add-On with the Detect SSNs field pre-selected, one can do so like this:
Scraper Add-On now lets you specify an access level (public, private, organization) for documents captured by the scraper.
- A bug fix for Transcribe Audio was released that fixed an issue when transcribing long YouTube videos.
- The Bulk Re-Process Add-On now includes Force OCR options, allowing you to re-run OCR on large sets of documents with ease.
As mentioned in our last release notes, DocumentCloud has introduced AI credits for our premium users. Credits can be used to perform powerful OCR using Amazon’s Textract, run GPT-3 across a set of documents to perform categorization and analysis tasks, and now credits can be used to use Google’s Cloud Translation API to translate documents in over 133 languages. Users can specify an input and output language code, an optional project ID to specify where to upload translations, and an access level for the uploaded translations.
- Change Note Visibility Add-On
If you’ve had large projects on DocumentCloud that you’ve needed to share with collaborators and needed to switch the visibility of large amounts of annotations to share notes, you may have found this to be a tedious task in the past. With the Change Note Visibility, you can change the visibility of annotations in small or large sets of documents in bulk by simply dispatching an Add-On.
- Document Splitter
Sometimes documents are hundreds or even thousands of pages long and clearly contain different document types. This is especially common in responsive documents from public records requests where agencies scan large stacks of paper and combine them into one PDF. With the Document Splitter Add-On, users can select a document or set of documents to split along a page, and it will upload the two resulting documents on DocumentCloud for you.
- Document Hasher
Source validity and transparency are critical to building trust with the public that documents are not manipulated or edited. DocumentCloud already includes the ability to provide metadata for documents that allow you to specify the source of documents, relevant articles in which documents are mentioned, descriptions, and more. A file hash is a unique identifier that is tied to a document. If any changes are made to the document, the hash value changes as well. By comparing the hash value of a document uploaded to DocumentCloud with the hash value of an original PDF, changes can be detected. The Document Hasher Add-On allows you to pull the SHA-1 hash value of any set of documents and add it as a key/value pair to the document(s), giving you one more way to capture metadata about documents and share them with the public.
Note: If you run the Document Hasher Add-On on a document and then redact or modify the document after, you will need to run the Document Hasher Add-On on the document again to capture the new hash value.