Archiving Internal Web Content

wood-3.pngWhen archiving web content, we use different methods depending on the type of web content. Each client may require a different service to suit their needs, so here at Hanzo we’ve developed software with sufficient breadth for various types of web archiving. In this blog post we’re going to take a look at internal web content and how go about archiving it.

What is ‘internal’ web content?

Internal web content can be a ‘type’ of content or content in a specific location or use-case. For example, in our experience, internal web content is password-protected websites, dynamic websites and form results, internal company resources such as wiki’s, SharePoint and business social networks. More simply: your company intranet.

A great example of how we archive internal content is the work we did for LOCOG (London Organising Committee of the Olympic and Paralympic Games). The committee had a secure SharePoint site that they wished to be archived. It was used to organize the games and includes web pages, media, wiki’s, shared collaborative workspaces, projects and documents.

How do we do it?

Hanzo’s crawlers were configured with the necessary security credentials to access the site in a secure way to make captures of the site. The advantage that Hanzo has over other archiving products is that it can utilize a broad range of security systems frequenty needed for access to internal web content. Furthermore, Hanzo does not just make a capture of documents and associated data in SharePoint, Hanzo makes a working replica of the site with all the content, media, documentation and information presented as they appeared on the live site itself.

Our software’s crawlers are configured to the requirements of the client and are deployed to the site in the same way a person would access the site. They capture the site content and write it to ISO 28500:2009 WARC files, before being scanned for viruses and malware. Afterwards reports and indexes are created based on the captured content.

One index consists various metadata fields and a full text index is also produced for traditional search and discovery purposes. Together with the WARC files, the indexes are used by Hanzo’s access software to make the captured material available to users.

This access control system gives the client control over archived content and system functionality, but it can also allow varying permission levels.

So despite the sophisticated nature of today’s collaborative systems, wikis, and intranets, Hanzo can accurately preserve websites of this nature and all of their complicated inner workings. In LOCOG’s case, an archived SharePoint website is an asset for both legal and historical reasons. This is something that many companies may benefit from.

Find out more…

Hanzo’s range of products can archive a variety of website and social media content, giving you protection against litigation. Get in contact with us to discuss your web-archiving needs.

Related posts

There are no related posts

Hopping on Down an Internet Rabbit Trail To Capture Context: How Hanzo’s hops work when preserving a website

Hopping on Down an Internet...

Today, a tremendous amount of your organization’s client communications are happening through the representations on ...

Read More >
News from The Sedona Conference Working Group 6 Annual Meeting

News from The Sedona...

This past week, I had the privilege of participating in The Sedona Conference Working Group 6 (WG6) Annual Meeting in ...

Read More >
If You’re Not Requesting Slack Data in Ediscovery—or Preserving It—What Are You Waiting For?

If You’re Not Requesting...

When you start an ediscovery project, are you explicitly asking your opponents to produce data from the collaboration ...

Read More >