Where Does a Web Archive Fit Into the eDiscovery Reference Model?


In a recent discussion with a financial services client in New York, I was asked to explain how Hanzo's web archive fits into the <a href="http://edrm.net/">Electronic Discovery Reference Model</a> (EDRM). As anyone who's ever taken part in the EDRM meetings, the answer to such a question can vary, depending on where you are in the discovery process, your role in it, and of course whether or not you are a vendor, lawyer, or a client.


To make this a reasonably short post, I'll try to keep my answer as specific as a vendor can be. If anyone wants to weigh in with alternative views, help yourself to a comment.

To start, take a look at the now famous EDRM diagram.

<a href="http://www.hanzoarchives.com/wp-content/uploads/2013/05/edrm-2-573.jpg"><img class="alignnone size-full wp-image-164" alt="Electronic Discovery Reference Model Diagram" src="http://www.hanzoarchives.com/wp-content/uploads/2013/05/edrm-2-573.jpg" width="573" height="313" /></a>

If you are unfamiliar with the diagram, read these short guides...
<li><a href="http://www.edrm.net/resources/edrm-stages-explained">EDRM Stages Explanation</a></li>
<li><a href="http://edrm.net/resources/diagram-elements">EDRM Diagram Elements</a></li>
If you want more information on that, <a href="http://edrm.net/">EDRM.net</a> have a set of white papers and twice yearly meetings in St. Paul, and I believe there is a <a href="http://edrm.net/archives/9066">EDRM meet and greet at LegalTech NY on 31 Jan 2011</a>.

<strong>Web Archives</strong>

<a title="eDiscovery" href="http://www.hanzoarchives.com/solutions/ediscovery/">Hanzo provides a native format web archiving service for eDiscovery</a>. Each client can archive multiple instances of their websites, intranet sites and social media posts. These are collected on a schedule according to their archive policy. Some content is also collected as it's posted - especially blogs, wikis, and social media, according to their archive policy.

It is possible to directly browse and search the web archive.

It is also possible to export all or a selected subset of the archive, either as a standalone native format web archive, or as pages in image form, or PDF's, or some other format. Again, as per archive policy.

Finally, it is possible to ingest exported archived content and their metadata into external systems, such as an enterprise digital repository, for legal hold for example, or a discovery platform, for legal processing or review.

<strong>How does this fit in EDRM?</strong>

As mentioned earlier, where we fit into EDRM varies enormously, depending on who you are, where you are in the process, and whether you are a vendor, lawyer or client. It's a heated debate, only recently beginning to settle around the EDRM model.

Basically though, Hanzo's archive (being in native format) fits into several stages.

Our archive services are driven by an archive policy, which is agreed with each client. The policy specifies (a) the schedule for capturing entire websites or domains and social media accounts, follows, etc., as specific snapshots at specific time intervals (daily, weekly, monthly, for example); and (b) trigger mechanisms to capture more dynamic, high frequency content such as posts, tweets, comments, etc. The archive policy, and the actions arising from it, fit in the <a href="http://edrm.net/projects/imrm">EDRM Information Management</a> stage.

On capturing the content, we archive it together with metadata and forensic information, storing the archived material in <a href="http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717">ISO 28500 WARC files</a>. It is possible to search and browse the archive in native form, view the structure of the archive, and the time dimension of the archive. This enables you to identify instances of websites to place in legal hold, for example, which fits into the <a href="http://edrm.net/resources/guides/edrm-framework-guides/identification">EDRM Identification</a> stage.

Following identification, we provide the ability to preserve and/or export the identified archived content: (a) as a standalone native format web archive, which can be packaged up and sent to third-parties, for example; or (b) as a transformation of the web archive, for example into images of pages, PDF's of pages, etc., together with metadata and forensic information. The exports can be stored independently or ingested into a third party preservation or discovery tool via integration or an adaptor. This fits into the <a href="http://edrm.net/resources/guides/edrm-framework-guides/preservation">Preservation</a> and <a href="http://edrm.net/resources/guides/edrm-framework-guides/collection">Collection</a> stages.

At the RHS of the EDRM diagram, Hanzo's native-format web archive can be used for the production of web-based evidence in its native-format directly in a web browser - together with metadata and forensic information from the original WARC files. This is our part of the <a href="http://edrm.net/resources/guides/edrm-framework-guides/production">Production</a> stage.

Finally, through partnership and integration efforts, it is also possible to view selected web pages using tools specialised for use in the central stages of the EDRM diagram: Processing, Review, Analysis, Production. More on this exciting development in a month or so!

<strong>Comments, Questions?</strong>

If you are familiar with the EDRM model, especially as a client, I'd love to hear about your experience and views on how web content and web archives fit within EDRM. Please speak out in the comments.

For more information on Hanzo's eDiscovery solutions:

[button]<a title="Web Archiving For eDiscovery White Paper" href="http://www.hanzoarchives.com/resources/web-archiving-for-ediscovery-white-paper/">Download Hanzo's eDiscovery White Paper</a>[/button]

About The Author