3 Things Most Website Archives Are Missing

| December 14 2016

3 Things Most Website Archives Are Missing

Keeping up with the complex and changing requirements of regulated industries, as set forth by FINRA, the SEC, the FTC, the FDA, and other governing bodies, makes website archiving both a necessity and a chore. If there are gaps in your Web and social media archiving, you may find you aren’t sufficiently prepared for compliance, litigation and eDiscovery requests. And, unfortunately, it’ll be too late to solve the problems.

There are three things that we most often find are missing from Web archives:

1. Missing and/or incomplete records

2. Not all metadata are captured

3. Data alteration before expiration of the retention period

Your business needs and the resources available are the principal points to consider when choosing your archiving strategy, tools, and services. Some of the major focus points are the issues of integrity, authenticity, and quality of the archival.

What are ISO 28500 and WARC?

ISO 28500 is the standard for Web content collection and preservation. It was developed by the International Internet Preservation Consortium. It specifies a methodology for collection, and it specifies a storage format called “WARC.”

The WARC (Web ARChive) file format is the international industry standard for storing collected Web content and associated data. It will preserve original Web content exactly as it was delivered from the target site, containing all of the metadata and ensuring the integrity of captured Web content. A WARC file is a container that provides structure to the data for processing, indexing, and access.

Data Capture and Preservation

Today you not only have to be concerned with website archiving, but also archiving blogs, social media channels, collaboration tools, internal and external communications, and mobile communications. It is all considered business communications, and you must keep accurate and defensible records of it all. If the need arises, you must be able to produce the data exactly the way they appeared at any given time on any given date, inclusive of any now-defunct data and any interactive elements (drop-down menus, pop-ups, mouse-overs, comments, likes, etc.).

Archiving necessitates modern crawler technology that understands JavaScript elements and the ability to take snapshots of your website and collect all data, including a diverse array of Web content: graphics, video, podcasts, webcasts, etc. It must be in its native format, digitally signed and time-/date-stamped to satisfy regulatory and legal requirements and to establish a defensible chain of custody. And don’t forget archiving of all social media and collaborative environments in their native format so that content can be viewed in its proper context.

You need to choose the technology that captures the content in its native format, where files, forms, links, drop-down menus, pop-ups, and mouse-overs perform as they did at the time in question. Failure to capture and preserve all the metadata puts your company at risk to meet regulatory and legal requirements, as well as eDiscovery demands.

In order to avoid spoliation of Web evidence, proper preservation methods are critical. It is vital that preserved Web content be sealed off from any live Web content so that the risk of alterations or changes to the original content is eliminated.  We have seen too many instances of alterations in improperly preserved web content, where preserved web content is replaced by ‘live’ content.   Further, storage in write-once, read-many (WORM) memory is critical to demonstrating the inalterability of your archive.

Does all this seem overwhelming? It doesn’t have to be. Avoid the risks of unauthenticated Web and social media records and missing or incomplete data. The key is finding a solution that accurately captures all of this browser-based Web content—not just bits and pieces.

Web Content Preservation Whitepaper

Related posts

Case study - United Airlines Can Revisit the Past, Thanks to Hanzo’s Dynamic Website Archives

Case study - United Airlines...

Objective United Airlines has a massive, complex, ever-changing website. Because it was spending too much time creating ...

Read More >
Maintaining Compliance in the Face of Constant Uncertainty: Webinar Recap

Maintaining Compliance in the...

Corporate compliance is always a hard job—but now that we’re in a global pandemic where laws, regulations, and ...

Read More >
7 Elements in Modern Websites That Your Current Archiving Solution Might Be Missing

7 Elements in Modern Websites...

Remember what websites used to look like back in the day? Advances in technology have led to several revolutions in ...

Read More >