Initial Gateway Grantees launch projects to help preserve, analyze and publish critical document collections

Initial Gateway Grantees launch projects to help preserve, analyze and publish critical document collections

Ongoing support program protects endangered materials through decentralized storage while giving DocumentCloud users range of new features

Written by
Edited by Amanda Hickman

Today, MuckRock is excited to announce the first round of DocumentCloud Gateway Grantees, four projects that bring together cutting-edge technology and at-risk document collections to model preserving access despite a range of global challenges. Revealing who profits from rainforest destruction to the secretive inner workings of Puerto Rico’s Fiscal Board, these projects leverage MuckRock’s DocumentCloud platform as well as the Filecoin storage network to ensure that the public can access this important information — now and in the future.

Each grantee has been awarded $10,000 and technical assistance for their projects, with the underlying technology that powers each effort open sourced and available for use by all DocumentCloud users in the form of new Add-Ons, integrated right within DocumentCloud. Through this program, every DocumentCloud user has access to a growing library of new functionality, from AI analysis tools to site monitors, with the ability for any user to write and share their own new features.

We’ll include details about additional grant opportunities and new transparency tools in upcoming newsletters. The current grantees will be wrapping up their projects between now and April.

The projects include:

Centro de Periodismo Investigativo, Puerto Rico’s Center for Investigative Journalism

In 2016, Congress passed the Puerto Rico Oversight, Management, and Economic Stability Act (PROMESA), a law that created the Financial Oversight and Management Board for Puerto Rico. Since its creation, the Board has promoted a culture of secrecy, arguing that it is not subject to Puerto Rico’s constitutional right to access information. Puerto Rico’s Centro de Periodismo Investigativo (CPI) has been successfully litigating to open up Promesa, winning the release of over 20,000 previously secret documents. But the scale and volume are already challenging to manage, and even more so as CPI works to identify and preserve information from other open document portals that could be shut down or restricted in the future. With this project, CPI will consolidate and permanently archive these critical collections, ensuring that they are permanently accessible despite the legal challenges and ongoing natural disasters.

As part of this project, the DocumentCloud team is working with CPI to develop new functionality including resilient access through IPFS integration and a custom search and presentation interface; enhanced site scraping and monitoring tools to archive the collections; and tools to help CPI and partner researchers automatically tag and organize the massive collection. These enhanced functionalities will be available for all DocumentCloud users to build on for their own projects.

Prior support from DocumentCloud helped CPI post the incendiary Telegram messages of Governor Ricardo Rosselló led to the shocking resignation of Governor Ricardo Rosselló after a popular outcry against his leadership.

Chicago CivicLab’s TIF Illumination Project

Tax Increment Financing Districts (TIFs) are special entities to subsidize development that could not have been financed under regular market circumstances. There is no exact count of these districts across the USA but the CivicLab estimates that between 8,000 and 10,000 places operate over 40,000 TIFs and that these entities cover the equivalent of over $40 billion in property taxes – removing those tax dollars from local revenue. This project brings together civic groups in 14 cities across Illinois to archive, analyze and make accessible documents detailing TIF districts and to build a permanent online archive containing every annual report of every TIF that is in current operation as well as track TIF districts that have expired.

DocumentCloud will help CivicLab more effectively scrape agency websites and permanently archive all the collected material through both the DocumentCloud interface and the Filecoin network. CivicLab and partners will also be able to extract a wide range of machine-readable data currently trapped in unwieldy PDFs and use MuckRock to request records that agencies have not posted online to ensure the collection is as complete as possible.

Fiquem Sabendo’s Opening environmental sanctions reports in Brazil

The Brazilian Amazon is facing record deforestation directly connected to environmental crimes that have often gone underreported and swept under the rug. Ibama, Brazil’s environmental agency, is responsible for issuing fines against companies and people who commit environmental crimes, but the agency’s data is limited and incomplete. Agencies deny access to critical details, claiming the documents contain sensitive or personal data and even go so far as to demand removal of previously released information.

FOIA non-profit Fiquem Sabendo will continue to build on its successful litigation for this data while building out enhanced site monitoring and data extraction tools to make environmental sanction data more complete and useful. This project will also help make sure that this hard-won information is permanently accessible despite political and corporate pressure to all through preservation on the Filecoin network and available via IPFS.

Data Liberation Project‘s FEMA Housing Reports Collection

After major disasters in the United States, FEMA provides government housing (typically trailers) for many displaced families. As of November, more than 4,000+ families live in such housing, the majority displaced by nine natural disasters. The agency tracks this data but does not publish it anywhere online. It does, however, distribute this data once a week in its Daily Operations Briefings, which the agency sends as a PowerPoint-style PDF to an email distribution list. This project will a) create a real-time, searchable archive of all FEMA Daily Operations Briefings through integration with DocumentCloud and Filecoin. b) parse the direct housing counts from those PDFs and convert that information into a public, structured dataset, and c) building tooling (including at least one DocumentCloud Add-On) that makes similar projects easier for others. With these efforts, inspired by a recent New York Times report and analysis, the Data Liberation Project hopes to enable further reporting and accountability by newsrooms and communities across the country.