11039 Tags

documentcloud

1 Project

View all...

DockIns: Machine Learning on Deadline for Journalists

As journalists dealing with data and document sets, we find that the most interesting information is usually hidden in large, unstructured, and incomplete sets of documents. Especially information in public contracts: what the government is buying, how much money is being spent, and who are the suppliers. To answer these questions, four media organizations — La Nacion, CLIP, Ojo Público, and MuckRock — joined forces under the JournalismAI Collab and experimented with different machine learning tools and techniques in order to build a platform that helps investigative reporters understand and process unstructured documents to get useful insights.

Learn more

79 Articles

View all...

Release Notes: Import and search emails, extract metadata from PDFs and more

Release Notes: Import and search emails, extract metadata from PDFs and more

Over the past few weeks, the MuckRock team has been busy with several updates and improvements. The biggest update: Our new Email Archiver Add-On allows you to preserve email files (EML/MBOX) with corresponding metadata for long-term storage by seamlessly converting emails to EA-PDFs, a new archive-friendly standard that preserves email metadata in a consistent way while ensuring emails are consistently preserved as PDFs.

Read More

An illustration showing documents piled up upon a blue background, with the words Releases Notes as a title.

Release Notes: Knight Election Hub, more OCR tools and expanded raw email access

Over the past few weeks, the MuckRock tech team has focused on several key updates and additions. These include the development of the Knight Election Hub, which offers vital resources to U.S. newsrooms for comprehensive coverage of the 2024 elections.

Read More

A stock photo of a laptop on a neat desk  browsing MuckRock's FOIA Log Explorer

Search across almost 170,000 requests via MuckRock’s expanded FOIA Log Explorer

Over the last few weeks, we have been hard at work on a range of improvements to MuckRock and DocumentCloud, but FOIA fans have something special to celebrate: We’ve imported many of the requests from FOIAonline into a searchable database which allows you to filter, browse and even re-request almost 170,000 requests with just a few clicks.

Read More

Black bars with the words For the Record underlined

For the Record: How Fiquem Sabendo used records requests and DocumentCloud to reveal corporate card expenditures

In Brazil, presidents and other top government officials have the right to use “corporate cards” to cover occasional travel expenses and other small value purchases on behalf of the federal government. In January 2023, the journalism non-profit Fiquem Sabendo secured access to the invoices for these cards, and last week, the Brazilian newsroom detailed how they meticulously scanned and organized these documents, making them accessible for the first time.

Read More

An upside down stock photo of documents in Russian and manilla envelopes.

Release Notes: Making it easier to sort, filter and reprocess document OCR

Since our last release notes, we released a new Add-On OCR Tagger that allows you to tag your document(s) based on the OCR engine used and we added better logging for when scheduled Add-Ons like Klaxon or Scraper get disabled. This helps more easily diagnose and correct outages that impact Add-Ons.

Read More