Web Archiving: Catching History Before It’s Gone


I recall in the early days at Hanzo Archives, we frequently discussed how <a href="http://informationr.net/ir/9-2/paper174.html">much information is disappearing from the web</a>. The numbers have changed over the years, but we certainly know that link-rot (loss of web pages resulting in broken links) and the ephemeral nature of social media, are major problems for information persistence. This is one of the fundamental reasons we developed our web and social media archiving business.

Here's a more recent reminder of the nature of the web: important historical events, which were recorded on Twitter, blogs, and other social media platforms, are now lost.

Reading about historical events as they evolve in real time on social media is a very new experience. Unlike old media, with social media we're able to express our views, share our feelings, and rally support or show concern as events unfold. Think: Arab Spring, Pussy Riot, US Elections, etc. Its a very human experience. But what happens afterwards, as these events slip from our timelines? One study suggests around 30% of the resources shared by social media are lost. Remember how important the events I mentioned were at the time? Their part in important historial records is now lost.

This is discussed at length in the article <a href="http://arxiv.org/abs/1209.3026">"Losing My Revolution" by SalahEldeen and Nelson</a>. See also this <a href="http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/">post</a> for a summary.

The tragedy here is not that we'll lose all account of events like Egypt's recent uprising or Occupy Wallstreet's movement, it's that without the original resources referred to in the social media conversations, the context of those conversations is lost. As a historical record, the input of thousands or millions of people across the world is significantly degraded. These authentic reactions from people who are there when events happen, and those responding from afar, enhance the human dimension to each moment in time. This is what makes the web as a historical record so dynamic, personal, and visceral. But we're losing it!

Hanzo's web archiving capabilities are built to collect and preserve the full context of such events. Not just the tweets and status updates, but also the links, the content they're all referring to, is collected too. If I tweet: "here's a video of this... &lt;link to video&gt;" then we collect the tweet, and the video, archive and preserve them, and enable the full conextual experience to be explored in the future.

We provide this full contextualised archiving capability to a varety of customers for a number of reasons: for financial services companies, government agencies, researchers, brand owners and individuals; for business intelligence, information governance, corporate heritage, and regulatory compliance.

The explosive use of social media, and most importantly, the resources on the web (documents, attachments, media) they refer to are now business records and need to be archived and preserved in accordance with your information governance policies. Make sure you are doing it right, capture the context, not just the tweets and statuses.

To learn more about how Hanzo Archives captures and preserves social media, including Twitter, Facebook, LinkedIn, YouTube, and Chatter, as well as the web resources they all refer to:
<li><a title="Corporate Heritage" href="http://www.hanzoarchives.com/solutions/corporate-heritage/">read our Corporate Heritage white paper</a>, and</li>
<li><a title="Contact Us" href="http://www.hanzoarchives.com/contact-us/">contact us for a 1-1 demo to discuss your requirements</a>.</li>

About The Author