Release Notes: Add-On Deep Linking, Translate Documents, Change Visibility of Notes in Bulk, Split Documents with Ease, and Tag Documents with Hash Values

Release Notes: Add-On Deep Linking, Translate Documents, Change Visibility of Notes in Bulk, Split Documents with Ease, and Tag Documents with Hash Values

DocumentCloud has had many feature updates within the last couple of months, including the ability to deep link to Add-Ons, translate documents using Translate Documents, split large documents using Document Splitter, add hash values as metadata to documents using Document Hasher, bulk change the visibility of notes on DocumentCloud using the Change Note Visibility Add-On, bug fixes for Transcribe Audio, and a few other general improvements.

General Improvements & Bug Fixes

  • Add-On Deep Linking
    DocumentCloud Add-Ons now have deep linking enabled, meaning you can share the link to a useful Add-On to others with ease. You will notice when clicking on an Add-On it will now pull up the configuration menu and change the URL. For example, clicking on the PII Detector Add-On allows me to link to the Add-On directly like so: https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/PII-Detector
    Add-Ons can also be shared with parameters pre-filled by modifying the URL. For example, to share a URL to the PII Detector Add-On with the Detect SSNs field pre-selected, one can do so like this:
    https://www.documentcloud.org/app?q=%2B&ssn=true#add-ons/MuckRock/PII-Detector
    PII Detector Add-On menu showing Detect SSNs selected and the URL modified as such.

  • Scraper Add-On now lets you specify an access level (public, private, organization) for documents captured by the scraper.

  • A bug fix for Transcribe Audio was released that fixed an issue when transcribing long YouTube videos.
  • The Bulk Re-Process Add-On now includes Force OCR options, allowing you to re-run OCR on large sets of documents with ease. Bulk Reprocess Add-On menu showing newly available force OCR options as well  as language selection

Premium Features

Translate Documents
As mentioned in our last release notes, DocumentCloud has introduced AI credits for our premium users. Credits can be used to perform powerful OCR using Amazon’s Textract, run GPT-3 across a set of documents to perform categorization and analysis tasks, and now credits can be used to use Google’s Cloud Translation API to translate documents in over 133 languages. Users can specify an input and output language code, an optional project ID to specify where to upload translations, and an access level for the uploaded translations.

Translate Documents Add-On menu showing a boolean selector for dry-run option to tell you the cost of running the translation, a two-character input language code, an optional project ID to specify where you want translations uploaded to, a two-character output language code, and an access level specifier for translations

General Features

  • Change Note Visibility Add-On
    If you’ve had large projects on DocumentCloud that you’ve needed to share with collaborators and needed to switch the visibility of large amounts of annotations to share notes, you may have found this to be a tedious task in the past. With the Change Note Visibility, you can change the visibility of annotations in small or large sets of documents in bulk by simply dispatching an Add-On.
    Change Note Visibility Add-On Menu with one field to designate the access level you’d like to change the access level of your notes to (public, private, organization).
  • Document Splitter
    Sometimes documents are hundreds or even thousands of pages long and clearly contain different document types. This is especially common in responsive documents from public records requests where agencies scan large stacks of paper and combine them into one PDF. With the Document Splitter Add-On, users can select a document or set of documents to split along a page, and it will upload the two resulting documents on DocumentCloud for you.
    Document Splitter Add-On menu showing one field to specify which page number you would like to split the document on
  • Document Hasher
    Source validity and transparency are critical to building trust with the public that documents are not manipulated or edited. DocumentCloud already includes the ability to provide metadata for documents that allow you to specify the source of documents, relevant articles in which documents are mentioned, descriptions, and more. A file hash is a unique identifier that is tied to a document. If any changes are made to the document, the hash value changes as well. By comparing the hash value of a document uploaded to DocumentCloud with the hash value of an original PDF, changes can be detected. The Document Hasher Add-On allows you to pull the SHA-1 hash value of any set of documents and add it as a key/value pair to the document(s), giving you one more way to capture metadata about documents and share them with the public.
    Note: If you run the Document Hasher Add-On on a document and then redact or modify the document after, you will need to run the Document Hasher Add-On on the document again to capture the new hash value. Document Hasher Add-On menu that allows you to select whether you want to run the Add-On on the currently selected document or all of the documents in the current search results