In our Pros and Cons series, we explore the different approaches to common web archiving challenges so you have the knowledge you need to make the right technology investment. You can also read our other articles in this series on APIs, WARC & WORM, PDFs, and Social Media Best Practices.
Archiving your organization’s website is like doing a home improvement project; both are prone to the DIY effect and sometimes the results are disastrous.
We’ve seen it happen countless times: a financial services company knows it needs to archive its website, but it also knows that it has an IT department. Why not just have IT archive the website in-house? How hard could it really be? The answer—just like with projects around the house—is usually “harder than you think.”
With home improvement, experts say it comes down to a balance between skills, time, and money, and “When one of those three things is in short supply, you have to compensate with one or both of the others.” It really is the same with web archiving.
Sure, some jobs are tiny and pose low risks. If your kitchen faucet is leaking, odds are you can watch a few YouTube videos and, even if you have no plumbing skills or prior experience, figure out how to fix it. While it’ll take you longer than it would take a plumber, and it might cost you a few extra parts as you fumble around, you’re unlikely to ruin your home even if you botch the job. But putting an addition on the back of your house? That’s a job for a pro.
With website archiving for regulatory compliance, the risks are huge. And unless your website is very simple—no interactive elements, no fillable forms, no investment calculators or chatbots or even image carousels—and rarely changes, you’re looking at a much bigger job than you may realize. If you’re looking at archiving any social media feeds, the complexity just went through the roof!
Here are some of the DIY web archiving methods that we’ve seen companies try (and fail).
Custom Bespoke WEB Archiving Solutions
We’ve encountered a surprising number of in-house IT groups that have been tasked with archiving their organizations’ websites. Here’s your first clue about how well this usually works: we’ve encountered them because they’ve come to us to fix the mess they’ve accidentally made.
These organizations think they’re managing their resources and responsibilities wisely, but, due to their lack of experience, they underestimate how difficult web archiving can be. They throw some time and money into designing a solution that works for their website, and they think they’re good to go.
On the plus side of the ledger, it’s true that they might save some money up front by using their existing resources to archive their website. Sometimes, though, even that benefit doesn’t bear out: most IT departments are already stretched thin, and this can be the final responsibility that figuratively breaks the IT department’s back.
Worse than that, we’ve seen those savings evaporate over time, as businesses have to spend far more time than they expected to design, troubleshoot, and implement their custom-built solution. And then there’s the risk:
Can you handle it if your solution doesn’t work well?
Will you be able to explain your lack of archives if you discover—perhaps six months or a year down the road—that some part of your archive doesn’t work correctly?
Will you be able to devote the manpower to checking your archives (all of them) on a regular basis so you can quickly detect and correct those mistakes?
When you work with an expert, they’ll not only set up your system faster than you could, but—if you’ve chosen wisely—they’ll also monitor your results over time. That frequent QA process—checking your archives both quantitatively and qualitatively to ensure that they’re capturing everything that’s needed at the frequency that’s needed—is one of the first things to fall through the cracks with an in-house archiving solution.
Content Management System (CMS) Magic
Well, hold on, you say. You’re already paying for a content management system (CMS) to run your website, and this was one of their selling points. They promised a complete website-restoration function. From any point in time you can restore the website you had at any other point in time.
In the pro column, you’re already paying for a CMS, and this function is available at no added cost. You don’t even have to do anything! That’s why we call it magic—it just happens at the wave of a wand. You need a prior version of your website, you get a prior version of your website. Easy peasy lemon squeezy, right?
Wrong. There are three main problems we’ve seen with CMS website restorations.
First, there’s a longevity problem. What happens when you want to switch CMS providers? If you need to maintain your business communications for, say, seven years, you’re looking at paying your first CMS provider for another seven years after you switch, just to maintain access to your prior websites. This solution suddenly went from being “free” to being ridiculously expensive.
But even if you have complete faith that you’ve found the best CMS provider there is—that they’re going to be around forever and you’re going to use them forever—you’re still not going to be happy with the day-to-day experience of relying on CMS website restoration for your archives. Remember that your archives aren’t there just for regulatory supervision—they’re there for your supervision of online communications too. What you need is an accessible and functional archive that you can get into frequently. What you’re getting with a CMS restored website, on the other hand, is a backup method, a failsafe that you hope you never have to use.
Finally, you’re going to miss out on a lot if all you can restore is your website information. Yes, your restored website will include all of your dynamic and interactive content—but it won’t have archived information from third parties. If you have a stock ticker, or information from a sales tracker, or links to third-party content, none of that is being captured or preserved by your CMS. Which means you won’t have a complete picture for either your own supervision or any regulatory inquiry that you have to answer to.
External Website Backups
Fine, then, what about your cold-storage website backups? You’re not relying on your CMS system here—this is a backup that your own IT department has made. So it doesn’t matter if you change CMS platforms. Why shouldn’t this work as an acceptable archive? Again, the plus side is that you may already be creating these backups, so you’re not adding an expense to your bottom line. And yes, you’re not tied to a particular CMS, so you retain the freedom to change providers without losing access to your prior website backup versions.
Unfortunately, as with CMS magic, using your website backups can cause you to miss out on essential regulatory requirements like any integrated third-party content as well as linked pages. And how can you show that a page was reputable and truthful at the time you linked to it if you don’t have a copy of that page preserved?
Beyond that, though, there’s another issue with restoring backed-up websites after a long period of time—they’re just not that functional or easy to use. Your old site probably used a different version of Java or a different database, so what you’re going to see when you view the backup won’t necessarily be what your customer saw years before. That’s not going to provide the kind of clean, compelling evidence that a regulatory agency will want to see.
You can probably see where this one is going already—but we have encountered companies that understood their obligation to archive their website and sort of tried to do so, without ever really committing to an effective approach. They might’ve captured some screenshots or saved their website a few times in PDF files, but they didn’t put much money or time into building a comprehensive archive. As with the other failed approaches, this has the benefit of being relatively cheap, though that comes at a high cost of compliance risk.
The real problem with screen captures and PDFs, though, is that they miss out on all of the dynamic, interactive content on a modern website. Don’t take our word for it: screen-cap this blog and then tinker around with the image. Is that a satisfying reproduction of your experience on this page? We didn’t think so.
Hanzo Knows What You Really Need From Your Web Archives
Hanzo has been meeting the archiving needs of regulatory compliance and eDiscovery professionals for nearly a decade now, which means we’ve already seen and learned from all the mistakes you might be about to make. Every one of our web archives is generated as a fully navigable, dynamic, interactive replica website, based on a customized web crawl that captures every aspect of modern websites, including third-party and linked content. Plus, our archives are backed by an intensive and ongoing human QA process.
We are the experts in web archiving for litigation and compliance. Ready to learn more? Start a conversation with us today.