The DocumentCloud Beta has a new document viewer. Internally nicknamed “scrolly zoomy,” this new viewer mimics modern PDF viewers. With a simple pinch on laptop trackpads and mobile devices, a document can now be zoomed smoothly. Providing modern gestures may not seem like a challenging feature, but the technical implementation required considerable finesse to provide a natural and quick feel. More importantly, this new feature bridges a gap that was painfully missing between DocumentCloud and the contemporary PDF viewing experience.
For previous site improvements, check out all of MuckRock’s release notes, and if you’d like updates emailed to you — along with ways to help contribute to the site’s development yourself — subscribe to our developer newsletter here.
Rebuilding DocumentCloud’s viewer for a smoother web
For more than a decade, DocumentCloud has provided a platform for uploading, analyzing, annotating, and publishing documents. Once a document is published, it can be viewed by anyone with a link and embedded in a news article. When DocumentCloud was getting started, an embeddable document viewing experience was pretty revolutionary, since most web browsers did not provide built-in PDF viewers. DocumentCloud’s viewer was deceptively simple: just render each page as an image file, only showing the images corresponding to the reader’s position at a given time.
DocumentCloud’s viewer is so simple and robust, it will work on a computer from the 90s. And that’s a big reason why we’ve accumulated nearly two billion views on published documents with this viewing platform. But we’ve experienced a persistent trend over the years that’s recently amplified: users sharing the raw PDF link to a DocumentCloud-hosted document instead of using our carefully crafted viewer. How could they? we thought, reflecting on the intelligent search features and user annotations that are lost in the raw PDF file. But the answer is simple — users will rightfully gravitate to the platform that provides the best feel, i.e. modern PDF viewers.
Our response is the new “scrolly zoomy” document viewing platform, which is a non-compromising combination of DocumentCloud’s robust power and modern document viewing elegance.
How does it work?
As a quick breakdown, PDF files are exceedingly complex and can be a somewhat tangled mess of fonts, images, vector graphics, and even embedded forms and JavaScript code (yikes!). The complexity gives you a universal way to represent documents, allowing you to technically zoom infinitely into a document page without ever seeing pixels. But this elaborate scheme brings with it a bird’s nest of problems:
A brief list of problems with PDF files:
-
100+ page PDF files are really slow to display on most computers
-
1,000+ page PDF files are often impossible to display on some PDF viewers or will crash your browser
-
Even short PDF files with really complicated vector graphics, overly large images, or embedded code can slow your computer to a crawl
-
Displaying some pages of PDF files requires downloading the entire file, which can mean you have to download 100s of megabytes before being able to see anything
-
Government agencies will often return very large PDF files in response to Freedom of Information Act (FOIA) requests. We’ve routinely seen documents with tens of thousands of pages. DocumentCloud hosts thousands of FOIA documents.
At DocumentCloud, we value showing you any document near-instantaneously more than taking advantage of every obscure feature the PDF file format has to offer. To provide a speedy and universal viewing experience, documents are pre-processed into a series of images for each page at different resolutions. Notably, we’ve increased the resolution of these images with this release, so new documents processed on the DocumentCloud beta will have enhanced definition and look great at deep zoom levels. The bottom line of this approach is that any document, even one 20,000 pages long, will load instantly in DocumentCloud’s viewer and be performant while scrolling or jumping through the document.
So far, this is mostly in line with how the old viewer works. To provide fluid zooming with pinch gestures, we completely rearchitected the DocumentCloud viewer frontend. Documents are internally represented as a series of pages with dimensions. All the dimensions for every page are preloaded when you open a document, which means the document viewer can calculate the overall height of the document at any given time. It can also calculate where every page of the document will be placed in exact pixel coordinates.
The document viewer then needs a way to transform these page coordinates into viewport coordinates, which corresponds to what is currently displayed on the user’s screen. As the user scrolls and zooms, the formula to transform these coordinates changes, respectively, to show the current page. (For the linear algebra-inclined, a 2D transformation matrix is employed.) When you pinch the document to zoom at a particular viewport position, the transformation formula is run in reverse to determine the appropriate page coordinate. To keep the document viewer spritely, the system calculates which pages are visible at any given time, hiding the rest. This means the user’s browser only has to download page images corresponding to what they’re currently looking at — not the entire document (kilobytes rather than mega-/gigabytes).
This new approach represents the best of both worlds: fluid and intuitive zoom gestures with the robust and efficient nature of DocumentCloud’s viewer. This release also brings page notes to the DocumentCloud beta, which are annotations that visibly show before a given page. The document viewer keeps track of the size of page notes to properly calculate pixel positions for every page. The dynamic size of page note content means that zooming requires careful consideration to properly keep track of every page position.
Underneath the hood, we are excited to announce we are using the Svelte framework for the new frontend. The “scrolly zoomy” aspect of the viewer will be released as a separate vanilla JavaScript library with the goal of empowering other platforms to use these gestures in a universal manner without having to rederive all the math.
Upgrade your newsroom to the new DocumentCloud
All newsrooms currently using the DocumentCloud Beta already have access to the above improvements without needing to make any changes, and all new newsrooms and other organizations joining DocumentCloud are now starting out with the Beta. If you’re newsroom is still using DocumentCloud Legacy, you can request access to get some early testers on the beta, opt to get your entire newsroom upgraded sooner, or delay access until later in the migration cycle if you need features not yet present in the DocumentCloud Beta.
- If your newsroom already uses DocumentCloud and you’d like some users to have access to the Beta for testing and evaluation purposes, email us with the subject line “Beta Access.”
- If your newsroom already uses DocumentCloud and you’re ready to migrate your entire newsroom to the new platform, including old documents, email us with the subject line “Beta Migration.”
- If you’d like to delay your migration until at least a certain date, fill out this form as soon as possible and our team will note your delay request as well as offer any support we can to help you prepare.
We also recommend subscribing to the DocumentCloud newsletter to get information on updates, new features releases, and chances to share what you’d like to see in upcoming releases.
Image via Wikimedia Commons