Call for Proposals: Better ways to help collect, understand and preserve the public’s documents

Call for Proposals: Better ways to help collect, understand and preserve the public’s documents

Written by

MuckRock’s first two rounds of Gateway Grantees are using DocumentCloud to reveal those secretly profiting from the destruction of Brazil’s rainforests, probe police misconduct in Chicago and much more. Now’s your chance to pitch a project that uses primary source documents to help inform and strengthen the public while leveraging AI, distributed storage and other leading technologies, baked right into DocumentCloud.

Selected projects will receive funding of up to $50,000 as well as technical and editorial support. They will also have a chance to build connections with other current and past Gateway Grant recipients to help maximize their collective impact.

Over the past two years, DocumentCloud’s capabilities have vastly expanded, including tools to monitor websites for newly added documents to using AI to summarize or sort vast troves of materials. We’ve also worked to ensure that your materials are more permanently accessible, even when faced with emerging technological and legal challenges.

All of that is built on top of DocumentCloud Add-Ons, small code snippets hosted on GitHub that can extend how DocumentCloud works in flexible and powerful ways. And the best part is that anyone who knows a little Python can write and start using their own Add-On, and then share what they build with other DocumentCloud users if they choose.

Critical to this effort has been our partnership with Filecoin Foundation for the Decentralized Web, which helped us build a more resilient, scalable way to ensure the millions of documents on DocumentCloud are preserved permanently through decentralized storage and access, so even if one server or host goes down, people can still access information that matters.

We’ve seen this technology scale up to help us ensure that important collections remain available in a consistent way, such as the CIA’s massive CREST database, which is now available via IPFS and the Filecoin network after the CIA broke many of the long-standing ways that the database was previously provided (read about our fight to make it public in the first place here).

Similar to the last round, this round will invite applicants to apply in a range of categories, each of which will offer different funding amounts as detailed below. We also encourage collaborative applications in a few specific areas where there was strong interest and potential for impact:

  • Overcoming memory holes: Websites, particularly government-run ones, are a lot more fragile than they might appear, with documents and data there one way and then gone later. Other important records might be subject to frivolous but costly legal threats, and often quietly disappear. How can we make it easier and more consistent to preserve these important materials, particularly through ways that are easy to implement and leverage decentralized storage to spread the information widely?

  • Global transparency tracking: Around the world, public records laws are critical to keeping government responsive to the people. How can DocumentCloud help these efforts, while also helping spur new collaborations between different countries to have a standardized tool kit for their work?

  • Legislative- and election-related documents: Transparency is key to both fair and true elections, and DocumentCloud users have already used it to help analyze everything from ambiguous ballots to the latest versions of legislative proposals.

  • AI and automation for helping understand documents: New advances in automation and machine learning offer incredible promise for helping scale up civic efforts to understand and explain our world, particularly if we can center them around human needs and under transparent human oversight.

  • Supporting libraries and archives: DocumentCloud has historically focused on newsrooms, but we know that reporters alone are not enough to keep the public informed. We would love to help libraries and archives leverage our rich suite of offerings and help us broaden who DocumentCloud can help.

If you indicate you’re interested, we’ll potentially connect you with others that are focusing on the same area to help you find novel solutions to problems many others are facing. Applying sooner and indicating that it’s an early draft gives us more time to provide feedback and connect you with others.

The final deadline for applications is September 7.

What grantees will receive

Depending on the category, strength of the proposal and applicant pool, grantees will receive funding between $10,000 and $50,000, with a total pool of $150,000 in support available. They will also have access to support from MuckRock’s technology team, including our open source fellow Sanjin, as well as example code from dozens of existing Add-Ons that they can use, extend and build on.

You’ll also have access to a Slack channel where current and past recipients will share what they’re working on, challenges and opportunities to collaborate, and where you can get additional support from us and others. Recipients are expected to regularly check in and update us here on their progress, as well as work with us on at least two write ups highlighting their project and its use of DocumentCloud and decentralized storage.

We’ll also help you get the word out about your work, and help you connect with others in the MuckRock network that may potentially help you expand it in exciting ways.

What we’re looking for in applications

We encourage a wide range of applicants, from newsrooms and educational institutions to multi-organization collaborations that span disciplines, as well as independent coders or journalists. Take a look at some of the recipients from the first round and the second round for inspiration and to get a sense of what a winning project can look like. The applications will be judged based on the following criteria:

  • Impact: When successfully executed, will it substantially enhance access, preservation or understanding of an important document collection? This could range in focus from one important, clearly defined collection you are already working with (for example, a cache of whistleblower documents or a key historical archive of presidential records or a historical era) or something that offers a more widespread utility, such as helping monitor and analyze federal or state contracts in a more friendly and permanent way. We’re particularly interested in projects that are focused on helping preserve materials at risk due to clearly identified challenges or bringing in new tools and approaches to DocumentCloud that all of our users could benefit from.

  • Creativity: How novel and unique is the approach to the challenge you’ve identified? Ideally, projects go beyond simply storing documents in an archive to thinking about the best way to address key challenges, whether those are new approaches to gathering and verifying information or ways to involve a broader range of participants.

  • Ability: How likely is the team to succeed at the proposed project? We welcome both prototypes that are looking for a way to get across the finish line as well as ideas starting from scratch, but you should have a good sense of both the nature of the documents being collected and both the technical, social and legal considerations involved in the proposed effort.

  • Benefit: How much better will the world be thanks to this effort? Will it help strengthen a community or raise awareness about a key issue? Is it something that others will be able to reuse in their own projects, whether through a new Add-On or by making a previously secret collection widely available?

Application categories

Collection preservation

This category is for already existing large document collections that the public would benefit from having access to, but which require some assistance or funding to digitize the records or otherwise some basic support to get them uploaded to DocumentCloud and the Filecoin network.

This might mean that they’re currently paper records that must be scanned and digitized; an existing digital collection that needs a more permanent, flexible home; or an archival institution that wishes to migrate its collections to take advantage of features DocumentCloud and the Filecoin network provide.

Our first round of Gateway Grantees included excellent examples of the impact these projects can have, including Luiz Fernando Toledo’s work on preserving documents detailing the beneficiaries of the destruction of the Brazilian rainforest and Centro de Periodismo Investigativo’s collection hosting the secretive records behind Puerto Rico’s Financial Oversight and Management Board.

We’re particularly interested in supporting projects where the archives are of major public importance, but face threats or pressures to hosting, access or preservation, and that are able to tap into the power of decentralized storage to circumvent censorship and retain public access.

Projects that are selected in this category will receive $10,000 in funding to help cover costs for hardware, labor and other costs, as well as support to ensure that these collections are searchable and safely archived on DocumentCloud and the Filecoin storage network.

Feature integration

Over the past year, DocumentCloud’s capabilities have expanded dramatically through open source contributions and the flexibility of the Add-On platform, including free audio transcription via Whisper, extracting spreadsheets with Tabula and integration with the Internet Archive. Continuing to grow this ecosystem helps newsrooms, civil society groups and others around the world do more with their documents, and also makes it easier to quickly organize and respond when existing information is endangered.

For this category, we’re looking for developers with existing tools that they would like to update, improve and integrate into DocumentCloud’s Add-On ecosystem, including tools for document analysis, site archiving, collaborative work and more. Projects that are selected in this category will receive $10,000 in funding as well as technical support and guidance.

Document collection and analysis

This category allows for a combination of the two previously mentioned approaches, providing applicants a chance to try new tools against specific document collection analysis, publication or preservation challenges. Applicants in this category should have a clear idea for a document collection and understanding of its benefit as well as a project plan for developing tools that will make that collection a reality.

After completion, these projects should serve as prototypes for what is possible. That might be by assisting with AI-powered analysis of the collection of documents, finding ways to open up projects to crowdsourced contributions, or by ensuring access despite censorship attempts. While we will provide some technical assistance to these projects, particularly with integration of tools into DocumentCloud and the Filecoin network, proposals are expected to outline how they’ll execute that vision and detail where additional support is required.

We are particularly excited about projects that help creatively tap into the decentralized web to circumvent and even repel censorship and web blocks, such as creatively bringing more attention and pressure on censoring organizations.

Projects selected in this category will receive between $20,000 and $30,000 in funding.

Cross-organization document-driven collaborations

DocumentCloud has always opened up new collaborative opportunities for our newsroom partners and this final category is seeking proposals that involve multiple organizations, whether from the same or different fields, that are working together on a shared repository of documents that are critical for the public.

We hope to help spur or expand projects that can demonstrate ongoing impact and opportunities to build on in a collaborative, iterative fashion. Project partners do not have to all be journalistic in nature; ideally, we’d like to see multi-disciplinary projects proposed that show the power of leveraging expertise and experience – everything from nonprofit organizations focused on social justice to academic think tanks to NGOs focused on ensuring ongoing access to and analysis of vital information.

Projects selected in this category will receive up to $50,000 in funding.