Release Notes: Introducing DocumentCloud AI Credits, GPT-3 Add-Ons, viewable organization membership lists, bulk processing Add-Ons and more

Release Notes: Introducing DocumentCloud AI Credits, GPT-3 Add-Ons, viewable organization membership lists, bulk processing Add-Ons and more

Over the past three months the team has been working hard to create better solutions for journalists and the public at large to share, analyze, annotate and, ultimately, publish source documents to the web, particularly with enhancements to the DocumentCloud platform.

For previous site improvements, check out all of MuckRock’s release notes, and if you’d like more frequent peaks at the latest and great, join the MuckRock Slack.

General feature updates

  • The ability to see the membership list for your MuckRock/DocumentCloud team on DocumentCloud’s sidebar. Users can also click on other users within their organization which automatically pulls up the documents they have uploaded publicly or to the organizational level, allowing you to more easily browse documents across your team.

Screenshot showing drop-down menu feature that allows you to see membership of organizations on DocumentCloud

  • New UI shout-out for DocumentCloud Add-Ons, making them more visible for new users.

  • Users who have not tried an Add-On for the first time now have a selection of default Add-Ons enabled, to highlight some of the possible Add-Ons to try from our expanding Add-Ons library.

Premium features

  • For our premium DocumentCloud users, we have introduced AI credits and five GPT-3 style Add-Ons. Professional accounts have access to 2,000 credits per month and organizational paid accounts have 5,000 credits per month for the first five users and 500 additional credits for every user after that. These credits can be used to brainstorm possible story ideas from large caches of documents, classify documents based on subject, de-jargonize scientific text, summarize legislative bills, or run general GPT-3 style prompts against a set of documents. These credits may also be used to perform OCR using Amazon Textract, mentioned in our last release notes. Find these Add-Ons under the Add-Ons menu, or if you’re interested in checking under the hood you can check out the source code of an Add-On that integrates DocumentCloud with GPT-3. To find Add-Ons, click on the Add-Ons drop-down menu, then click “Browse All Add-Ons” and search for an Add-On. Enable an Add-On by marking it as active, and then it will appear in your Add-Ons drop-down menu to run the Add-On.

Screenshot of Add-Ons drop down menu and Browse All Add-Ons button

New Add-Ons

  • Change Visibility Add-On allows you to change the access level (public, private, or organization) of large sets of documents you own, without having to page through 25 documents at a time.

  • Bulk Add To Project Add-On allows you to add large sets of documents to a project without having to add documents 25 at a time.

  • Bulk Reprocess Add-On allows you to re-procress large sets of documents without having to page through 25 documents at a time. If your documents still fails to upload, you may consider running Clear Failed Uploads.

  • Clear Failed Uploads allows you to bulk-delete documents that are stuck in processing or have other issues upon upload. If you are experiencing issues uploading large sets of documents, we highly encourage you to use our Batch Upload Script which not only intelligently staggers the processing of documents which leads to less errors, but also keeps track of failed uploads for you in a database to retry.

  • PDF Reflow Add-On allows you to optimize your PDFs for reading on mobile phones or e-readers. The added benefit is this Add-On allows you to convert documents that may have two columns to a single column based document, making them easier to OCR and analyze.

  • PDF Compression Add-On allows you to provide a public Google Drive or Dropbox link to a PDF that you want to compress and upload to DocumentCloud. By default, DocumentCloud only allows uploading of PDFs smaller than 500MB. With this Add-On, it will try to compress the PDF before upload. If it is still larger than 500MB, it will warn you to split the PDF instead to enable it to be uploaded.

Add-On improvements

  • Add-Ons now have Soft Time Outs. If you run an Add-On on a large set of documents, Soft Time Outs allow you to run the Add-On for a specified period of time (Default: 5 minutes), and then if the Add-On made sufficient progress during that time period it will call a new run of the same Add-On with the remainder of the documents. If it has not made progress, it will continue to try until the hard time out (usually ten minutes) and then fail if still no progress is made. This new timeout system has allowed us to balance the desire to run Add-Ons on large sets of documents with the ability to determine if an Add-On has stalled and be efficient about allocating resources on Github Actions.

  • Performance improvements to our Push to IPFS/Filecoin Add-On now allow you to upload large sets of documents to the interplanetary file system via estuary.

  • Stability improvements to our Move Account Add-On now allow you transfer the ownership of large sets of documents to another DocumentCloud user.


Image via Wikimedia Commons