Release Notes: A look at the DocumentCloud beta's overhauled search

Release Notes: A look at the DocumentCloud beta’s overhauled search

DocumentCloud’s search is getting an upgrade for more powerful, flexible queries

Written by
Edited by Beryl Lipton

We’re still hard at work putting more polish and finishing touches on the revamped DocumentCloud, but we wanted to give you a look at recent search enhancements. Better yet, you can go ahead and try them out even without access to the beta.

For previous site improvements, check out all of MuckRock’s release notes, and if you’d like updates emailed to you - along with ways to help contribute to the site’s development yourself - subscribe to our developer newsletter here.

Building on what already works

As Dylan Freedman worked to revamp DocumentCloud’s search capabilities, we wanted to build on what already worked well by balancing powerful search capabilities with an intuitive design to let you quickly start honing in on what interest you without having to learn complex systems.

As a result, there are a lot of similarities with DocumentCloud’s current search interface …

DocumentCloud's current search interface with a sample query

… and the new search:

DocumentCloud's upcoming search interface with a sample query

We’ve added subtle stylings so that you can more quickly see which terms are linked with which search filters, and you’ll note that the unique ID for each element — the long string of numbers — has been moved from the front of the element to the back. This has been done to improve readability, but those numbers are still important in cases where numerous people have uploaded documents with the same name.

We also worked to preserve the ability to edit any part of the query just from the keyboard. Advanced users can hand type any element in directly without touching the mouse. For users just getting used to the platform, clicking on various DocumentCloud options (such as “Your Documents” or clicking on a specific tag) will update the query box as well so that you can learn how search works over time.

More powerful query options

Through re-architecting the site, we’ve also been able to enable more powerful search queries as well. Under the hood, DocumentCloud’s search is powered by Solr, so most of the tricks that can be used with that work here.

That includes things like including basic logical constructions (for example, “hello AND goodbye” would require both words, while “hello OR goodbye” pulls results with either) and the ability to exclude results that meet certain criteria (‘-title:”my doc”’ would exclude all documents that include “my doc” in the title).

It also allows fuzzy matching with the ‘~’ key as well as wildcard characters (“MuckR*ck”) so that even if there’s a typo or OCR error, you still get the results you’re looking for.

One of my favorite new tricks is the ability to search for instances where two words appear within a certain distance of each other.

An example search query showing a search for when Trump and Mueller show up within five words of each other, "Trump Mueller"~5

You can also filter search results by date uploaded, tags, and page length.

Try the new search functionality today

There’s a lot more and we’re continuing to roll out tweaks and improvements in the coming weeks, but early beta users are already getting to play with these updates. To try them out yourself, just visit the public repository hosted here and click “Learn more” under the search bar to see everything the the upgraded search can do.

If you’re a newsroom not on DocumentCloud yet, you can register for access here. All new newsrooms being onboarded are launching with the beta. For newsrooms that already have a DocumentCloud account or non-journalists, register for the DocumentCloud newsletter to get updates on when we’re migrating old newsrooms to the platform as well as details on how we’re expanding access to DocumentCloud to other types of usage.

Reporting bugs and feature requests

As we continue to revamp DocumentCloud, we’re continuing to tweak and improve all our digital tools, including MuckRock. If you spot a bug or have a feature request, you can help by opening an issue on GitHub.

If you do, please search open issues first to make sure it hasn’t already been reported. If it has been reported previously, please leave an additional comment letting us know it’s an issue for you, particularly if you can provide more details about when it crops up or what you think is causing the problem.


Image via Wikimedia Commons