You’re a terrific data custodian. When you send a letter to a client, you keep a copy for your records. You neatly stow your paper files away in filing cabinets. You archive your emails and keep organized copies of your electronic files. After all, you might need information in those files and records for discovery in a court case or to establish your compliance with regulatory requirements.
Meanwhile, your website is humming along in cyberspace, busily communicating with clients and prospective clients. People are visiting your page, perhaps interacting with it, and definitely interpreting the claims you make about your products and services. Are you doing enough to preserve that content?
WHY YOU NEED WEB PRESERVATION FOR EDISCOVERY AND COMPLIANCE
Everything that can be viewed in a browser—such as Chrome, Firefox, or Microsoft Edge—qualifies as web content or web-based information. But most lawyers and legal professionals aren’t thinking enough about how to preserve that potential web-based evidence for litigation.
Are you keeping a copy of everything you say on your website? And can it be authenticated and admitted as evidence in court?
From your litigation requirements around ediscovery—the preservation of potentially relevant information—to compliance requirements such as banking communication standards, can you prove what your website claimed? You want to have confidence that if someone were to make a claim against you for, say, false advertising, you could rebut that charge or defend yourself in court. Even if someone went so far as to falsify information about what your website said, wouldn’t you want to be able to prove that this trumped-up version wasn’t the truth?
The consequences of failing to preserve information—including web content—can be dire. At their worst, court sanctions for spoliation of evidence can mean dismissal of your case or claim. But long before then, the impact of uncertainty can be felt. After all, it’s hard to reliably conduct any early case assessment if you don’t know what your website said or how it looked. To settle valid claims early or to have the courage of your convictions in going forward against a frivolous case, you have to know, and be able to prove, what the customer saw online.
Unfortunately, when it comes to collecting unstructured, dynamic content, whether it’s an interactive website guiding customers through their journey or a complex social media conversation, most of the obvious methods aren’t sufficient.
WHY PUBLIC ARCHIVES, SCREENSHOTS, AND API COLLECTION METHODS DON’T WORK
PUBLIC INTERNET ARCHIVES
Why do you even need to preserve and collect your own website? Doesn’t the Internet Archive’s Wayback Machine capture everything that happens online and store it forever—somewhere—for the good of all mankind?
We’re not knocking the Internet Archive; it stores a tremendous amount of useful information and provides a terrific service for web surfers who want to see what a website used to say. But it’s not a forensic evidence collection method. First, its captures are infrequent and irregular. How often is your website updated? Chances are you can’t guarantee that every one of your updates and site modifications would be archived by an unaffiliated third party that you aren’t paying to do so.
Second, the Wayback Machine’s archives can’t be readily authenticated for use in court—and for good reason. Those archives may not capture all content from a website. Even if they do, they’re subject to revisionist history: users can request that information be changed. That’s why the court in Leidig v. BuzzFeed rejected the plaintiff’s contention that it didn’t need to preserve its websites because they were accessible through the Internet Archive. Rather, the court found that the plaintiff failed to show “that the data generated from the website is reliable, complete, and admissible in court.”
SCREENSHOTS OR PDFS
Can’t you just screenshot your website or create a PDF file displaying its text? Maybe, if your website is straight out of 1995. But these methods fall far short for today’s interactive and personalized websites. A screenshot or PDF is a static image of dynamic, often hidden content. So much is missing: information in drop-down menus, customized pages based on user locations or customer personas, and calculations or individualized responses based on menu selections and answers to questions.
Imagine if you took a screenshot of a complex Excel spreadsheet and tried to pass that off in discovery. Your opponent wouldn’t be able to see any formula data or even tell which cells contained formulas. (Even blank cells could have “if-then” formulas that return a blank for specific values!) This worthless data wouldn’t pass muster for a spreadsheet, and it doesn’t work any better for website capture.
But what about screenshot videos? Couldn’t you just take video of a user selecting different dropdown menu options and responding to questions? Besides the fact that it could take an inordinate amount of time to walk through all of the available options, that user would only access one variant of the website—missing out on the view that another, differently situated customer might have seen. Even if there aren’t customized variants of your website, you would have no way to prove that from a static capture method.
SELF-SERVICE API ARCHIVING
There are plenty of free or budget-friendly web capture tools that promise rapid, no-technical-skill-needed site preservation. These tools rely on APIs—application programming interfaces. When they work, they work—but API captures often miss large chunks of data and many APIs are simply no longer available. Companies have withdrawn their APIs from public access in response to data breach incidents and data privacy concerns.
Even when APIs can still be used to archive websites, do you really want to be stuck at the mercy of a company that could withdraw its API at any time? These methods leave the door open for your data access to be impeded.
Enough of what you can’t do or what isn’t good enough. How can you archive and preserve website data accurately, reliably, and defensibly?
NATIVE FORMAT WEB PRESERVATION IS THE BEST ANSWER
With native format web archiving, an entire website is saved in its original format. That website—although it’s disconnected from the live internet to ensure that it doesn’t download or access new content—is fully active, just as if it were online. It displays a dynamic view, allowing the user to click on links, select options from dropdown menus, and access different versions of personalized web experiences. That means that it doesn’t just save the primary website it’s collecting: it saves every website that can be reached from that page.
The saved website records every possible server request and the answer to that request, along with all of the supporting metadata to establish the authenticity of its information. Because each event is time-stamped and hashed, it’s easy to establish that the preserved website is authentic and admissible and to prove the chain of custody underlying its creation.
At Hanzo Archives, we back up our native format preservation by also saving a PDF version and a PNG version (an image, like a GIF file), both hashed, for comparison. And with human, rather than automated, QA, we test every capture to ensure that we’re preserving the information you need.
Ready to learn more about how you can graduate from static preservation and capture confidence? Don’t leave the door open on some of your most important customer communications.