Keeping up with the complex and changing requirements of regulated industries, as set forth by FINRA, the SEC, the FTC, the FDA, and other governing bodies, makes website archiving both a necessity and a chore. If there are gaps in your Web and social media archiving, you may find you aren’t sufficiently prepared for compliance, litigation and eDiscovery requests. And, unfortunately, it’ll be too late to solve the problems.
There are three things that we most often find are missing from Web archives:
1. Missing and/or incomplete records
2. Not all metadata are captured
3. Data alteration before expiration of the retention period
Your business needs and the resources available are the principal points to consider when choosing your archiving strategy, tools, and services. Some of the major focus points are the issues of integrity, authenticity, and quality of the archival.
What are ISO 28500 and WARC?
ISO 28500 is the standard for Web content collection and preservation. It was developed by the International Internet Preservation Consortium. It specifies a methodology for collection, and it specifies a storage format called “WARC.”
The WARC (Web ARChive) file format is the international industry standard for storing collected Web content and associated data. It will preserve original Web content exactly as it was delivered from the target site, containing all of the metadata and ensuring the integrity of captured Web content. A WARC file is a container that provides structure to the data for processing, indexing, and access.
Data Capture and Preservation
Today you not only have to be concerned with website archiving, but also archiving blogs, social media channels, collaboration tools, internal and external communications, and mobile communications. It is all considered business communications, and you must keep accurate and defensible records of it all. If the need arises, you must be able to produce the data exactly the way they appeared at any given time on any given date, inclusive of any now-defunct data and any interactive elements (drop-down menus, pop-ups, mouse-overs, comments, likes, etc.).
You need to choose the technology that captures the content in its native format, where files, forms, links, drop-down menus, pop-ups, and mouse-overs perform as they did at the time in question. Failure to capture and preserve all the metadata puts your company at risk to meet regulatory and legal requirements, as well as eDiscovery demands.
In order to avoid spoliation of Web evidence, proper preservation methods are critical. It is vital that preserved Web content be sealed off from any live Web content so that the risk of alterations or changes to the original content is eliminated. We have seen too many instances of alterations in improperly preserved web content, where preserved web content is replaced by ‘live’ content. Further, storage in write-once, read-many (WORM) memory is critical to demonstrating the inalterability of your archive.
Does all this seem overwhelming? It doesn’t have to be. Avoid the risks of unauthenticated Web and social media records and missing or incomplete data. The key is finding a solution that accurately captures all of this browser-based Web content—not just bits and pieces.