Web Archiving for Compliance 101: The Pros and Cons of PDFs

| February 21 2019

Today’s business happens on the web. Social media platforms, collaborative online technology, and interactive sites are the hub of our digital economy, and just like all of your other business records, you need to retain archives of your web and online communications to maintain regulatory compliance.

But there’s a right way to do that, and a decidedly wrong way, and the wrong way happens to be old, outdated, and often accompanied by SEC and FINRA action against your organization. To make matters worse, unfortunately, the decidedly wrong, old, outdated way to archive the web is also the most common way; with a PDF.

PDF web archives make about as much sense as taking an ice-fishing trip to Miami in August, or really, anytime; we don't think ice-fishing is very popular in Florida. That's not to say PDFs weren't a good approach to web archiving at one point, but like 56K dial-up internet, flip phones, and VHS tapes, they've simply grown outdated, laying the groundwork for better, modern, and more practical solutions.


We’ll give it to PDF archives of web and social media content on a couple of fronts—they’re quick, cheap, easy to share, and viewable on most devices. They’re also likely to hold up for a while, as the PDF format is widely supported, which is great, since compliance records are the kind of thing you might not need until years later. So far, PDFs are looking pretty good, right?

Unfortunately for all of the PDF web archive fans out there, that's the extent of the benefits we've come up with. You can opt not to take our word on this, but we're what you may call "subject matter experts" when it comes to capturing and preserving online content; our co-founders invented the WARC file and were approached by the British National Library in 2009 to help them archive the internet.

The problem with PDFs is that they are no match for the dynamic, interactive content on today’s websites. In many cases, PDF archiving simply fails to meet the requirements of the SEC, FINRA, and other regulatory bodies that require compliance teams to keep an archive of web and social media activity in the first place.

So let’s take a closer look at why PDF archives can’t stand up to modern, real-world web and social media content.

Want to read more like this?



Why PDF Archiving Fails To Meet Regulatory Requirements

The first, and biggest problem with PDFs is simple. They’re snapshots and still images of content that is anything but static.

Suppose you went to a theater on opening night to see the latest Marvel blockbuster, but—no spoilers!—instead of a movie, the theater projected one single photo, an isolated frame from the movie, for two hours. How loudly would you demand a refund? You would probably react the way Hulk did when he met Loki in the first Avengers.


Because they’re still images, PDFs don’t capture context, and they do not replicate the experience of browsing the actual website or social media page you are attempting to archive.

With the sheer volume of information available online, in all of its different, unique, quirky shapes and sizes, context is essential. For a more relevant example than our friends Hulk and Loki, here are two out of context snippets of a website: 

First, an animated GIF that represents a portion of a page on the website of a prominent, well known, highly regulated, billion dollar, Fortune 100 organization. 


Out of context, you must admit, that seems like a pretty weird thing to have on your website. Now let's take a look at a static image of that entire page, which should hopefully give us more context, if not a sigh of relief.

Screen Shot 2019-02-18 at 4.22.02 PM

The static image above, as well as the GIF, are fractions of information and data from the Statefarm Life Insurance Calculator; a 5-category, multi-question, interactive, animated journey unique to each user, with hundreds, if not thousands, of possible outcomes and results.

Now, here is one instance (of thousands) of a complete start to finish Statefarm Life Insurance Calculator experience, with all of its interactivity, animation, and data preserved.

HubSpot Video


As we've illustrated, it would be impossible to capture the context of the information on this page with a PDF, let alone the data in all the possible outcomes, because quite simply, things like this did not exist when PDF web archives were the standard. While the web has advanced to a new standard, and evolved beyond what a PDF is capable of capturing, plenty of organizations are still willing to sell you a PDF web archiving solution instead of building new technology that is capable of capturing and preserving the web of today in its native format (like ours!).

PDFs are particularly ineffective for social media and other interactive communications since they provide no way to investigate who liked a post or who commented on it, both of which can be critical for FINRA compliance. And they don’t capture dynamic or interactive elements like rotating image carousels, dropdown menus, mouse-over content, expandable text videos, or GIFs.

Even if you use a PDF to fully capture an entirely static page, you’ll still be missing out on authenticity and admissibility as evidence in court, for starters, which carries over into the reliability and trustworthiness of evidence provided in response to a regulatory inquiry. In the spirit of preserving context, there are few things more out of context than printing social media content and online activity on a piece of paper to present as evidence. PDFs are incredibly easy to manipulate using Photoshop or other tools, negating their utility as evidence. They’re also not easily searchable without text extraction, which makes them even harder to use and more time consuming to analyze. 

If you've made it this far, we've probably convinced you what a bad idea PDF archiving is. Fortunately, PDF archiving isn’t the only option. There’s a better way to archive online content, and it's technology we've been refining and improving upon for a decade.

The Benefits of Native-Format Web Archiving

Instead of crawling a website and collecting still images—like snapping a photo of every second of a movie and trying to pass a pile of pictures off as the feature-length film—there’s a way to collect the entire code of a website so you can replay the whole story. Say hello to native format WARC (Web ARChive) files, which our co-founders helped establish.

HubSpot Video

WARC files capture everything that was present on the live site—and then they let you play it back, interacting with the site as if it were live. That means you get every bit of dynamic content: videos play, GIFs animate, and carousels merrily switch through their pictures. You can interact with that content, clicking on links, expanding social media comments and reactions, and exploring dropdown menus and hidden content. And because you can investigate links, you can see the full context for whatever conversation unfolded. You’re not getting a single overheard snippet that might be taken the wrong way. You can follow the thread all the way back to its origin.

You can also quickly and easily filter, sort, and search your archives without waiting for error-prone file conversion or text extraction. And these native-format web captures are backed by full metadata, authenticated, preserved in WORM (write one, read many) format, and readily admissible as solid evidence in court or in a regulatory investigation.

With Hanzo Dynamic Capture's native-format website preservation technology, you can go to sleep at night with the peace of mind that complying with these regulatory requirements are one less thing you'll need to worry about.

Let's discuss how to reclaim your peace of mind



Web Archiving for Compliance 101: The Pros and Cons of PDF is the second installment of a new series, Hanzo Knows, in which our team dives deep into essential regulatory and technology topics around web archiving, eDiscovery, and investigations. You can read the first installment, The Complete Guide to SEC Rule 17a-4 for Compliance Professionals, here


Related posts

7 Data Archiving Trends: What We Expect to See in 2022

7 Data Archiving Trends: What...

Prognostications and predictions are inherently flawed. Does anyone really believe Punxsutawney Phil is a reliable ...

Read More >
FDA Regulatory Compliance: Can You Prove What Your Website Said?

FDA Regulatory Compliance:...

If you’re marketing any pharmaceutical or medical device, the Food and Drug Administration (FDA) has its eye on you. ...

Read More >
Webinar Recap: Web Archiving for Compliance—Getting All of Your Dynamic Content

Webinar Recap: Web Archiving...

Websites today are more complex, sophisticated, and multifaceted than ever before. Whereas old-school websites had a ...

Read More >

Get in Touch to Learn More

Hanzo’s purpose-built, best-in-class solutions can help your readiness to respond to the next discovery request, investigation, or audit. Contact us to learn more.

Contact Us