Release Notes: Introducing DocumentCloud searchable notes, advanced OCR and additional internationalization options

Release Notes: Introducing DocumentCloud searchable notes, advanced OCR and additional internationalization options

DocumentCloud has had many feature updates within the last couple of months, including the ability to de-index documents from DocumentCloud’s public search and search engines like Google, the ability to upload documents via email, several new Add-Ons and pro features which include the ability to search publicly accessible notes and use Amazon’s Textract OCR to get better text extraction from within hard-to-OCR documents.

Read More

Upload large collections of documents to DocumentCloud with ease

Upload large collections of documents to DocumentCloud with ease

Uploading large sets (hundreds, thousands, or even millions) of documents to DocumentCloud using the user interface can be laborious and requires careful monitoring of uploads for processing errors and splitting up the document set into smaller batches.

DocumentCloud’s Batch Upload Script was initially written to upload the CIA Crest files, which contains almost 1 million files. It keeps track of which files were uploaded successfully, so that it can be stopped and restarted and it will pick up where it left off, and errors can be retried. It uploads files in batches. It can be stopped gracefully by pressing CTRL+C (once) while it is running. A recent rewrite allows the script to run on any directory of documents.

Read More

Initial Gateway Grantees launch projects to help preserve, analyze and publish critical document collections

Initial Gateway Grantees launch projects to help preserve, analyze and publish critical document collections

Ongoing support program protects endangered materials through decentralized storage while giving DocumentCloud users a range of new features.

Read More

New York City could be doing more to use its wastewater testing data, official says

New York City could be doing more to use its wastewater testing data, official says

The comments from a senior New York City environment official overseeing its wastewater surveillance program represent a sharp departure from a joint statement made to MuckRock and the Gothamist last month by the city’s health and environment agencies, which called wastewater surveillance a “developing field,” stressing a need for further research before it could be used to inform policy action.

Read More

The 'Uncounted:' People of color are dying at much higher rates than what COVID data suggests

The ‘Uncounted:’ People of color are dying at much higher rates than what COVID data suggests

Unspecific, unknown deaths rose 10 times more among Black, Hispanic and Indigenous people than among white Americans during the COVID-19 pandemic, according to a new analysis by MuckRock. The true toll of the COVID-19 pandemic on many communities of color is worse than previously known.

Read More

Projects See all

  • FOIA 101: Tips and Tricks to Make You a Transparency Master

    ★ Featured
    Whether it's your first request or your first request *today,* it never hurts to go over the basics. MuckRock's compiled a lot of FOIA advice over the years, and with this project, it's all in one place.

    Learn more

  • U.S. Officials Response to COVID-19 in the Navajo Nation

    ★ Featured
    In partnership with the Indigenous Investigative Collective and the Native American Journalists Association, we're building investigative journalism infrastructure in Indian Country by supporting networked reporting on COVID-19.

    Learn more

  • DockIns: Machine Learning on Deadline for Journalists

    ★ Featured
    As journalists dealing with data and document sets, we find that the most interesting information is usually hidden in large, unstructured, and incomplete sets of documents. Especially information in public contracts: what the government is buying, how much money is being spent, and who are the suppliers. To answer these questions, four media organizations — La Nacion, CLIP, Ojo Público, and MuckRock — joined forces under the JournalismAI Collab and experimented with different machine learning tools and techniques in order to build a platform that helps investigative reporters understand and process unstructured documents to get useful insights.

    Learn more