We’re training AI to keep an eye on government. Come join us.

We’re training AI to keep an eye on government. Come join us.

The Ethics and Governance in AI Initiative funds new MuckRock tools exploring how machine learning can automate document analysis

Written by
Edited by JPat Brown

Today we’re excited to announce a new initiative to help journalists, non-profits, and others interested in open government tap the power of machine learning to better analyze large sets of documents, ranging from email dumps and meeting minutes to housing inspection reports and archival documentation.

This project, dubbed Sidekick, is funded by the Ethics and Governance in AI Initiative, a joint project of the MIT Media Lab and the Harvard Berkman Klein Center, as part of the AI and the News Open Challenge.

We have already begun reporting on the new challenges AI presents to open government and civil society, but we also think that machine learning provides an incredibly opportunity to amplify the efforts of journalists, researchers, and groups of citizens working to better understand their world.

Already, these groups are winning access to larger and larger document sets, but getting the information is just the start. Understanding what is in those PDFs can be just as challenging, requiring hours of sifting and data entry. Sidekick will offer accessible and intuitive crowdsourcing and machine learning tools to help newsrooms and other groups automate turning documents into data, helping quickly analyze tens of thousands of pages while highlighting sections that might go overlooked.

This work builds off what we’ve learned over the past year working with on a variety of crowdsourcing projects that used the Assignments tool: Already, over 800 people have submitted over 13,000 entries through that platform, and we’re excited to see how we can use similar contributions to train our system to recognize various types of documents and help sort through them.

As part of this project, we’re looking for collaborators with large document sets that need sorting and classifications, particularly if those documents are generated on a regular basis and if they’re able to be made public so that others can help out with the initial work.

If you think you have a problem that might be a good fit, please get in touch and we’d love to talk.

We’re grateful to the the Ethics and Governance of AI Initiative for this opportunity. The Initiative is supported by the John S. and James L. Knight Foundation, Omidyar Network, LinkedIn co-founder Reid Hoffman, and the William and Flora Hewlett Foundation. The Initiative is a fiscal sponsorship fund of The Miami Foundation.

We’re also excited to work with the six other grantees that are part of this program; you can read more about the other winners here.


Image via NIH Flickr