Capture And Preservation Of Data As A Service

| March 23 2015

2014-02-23-15.46.44-e1394185194649.jpgHanzo has participated in a number of research projects over the years, variously funded by the European Commission Seventh Framework Programme (FP7), JISC and others. These are important ways to significantly improve the state-of-the-art in technology related fields. For Hanzo a number of innovations have resulted from our research projects. Here are just two features that our customers say set us apart:

  • Interaction modelling enables our crawler to exhibit human browsing behaviours. This means we can collect complex and interactive web content that is inaccessible to other crawlers.
  • Temporal analysis and change detection. This enables our users to both measure and report on changes to their website content over long periods of time and visualise individual changes between any two captures.

So it is exciting to tell you about the results of our latest research project, DIACHRON.

DIACHRON is about managing the evolution and preservation of the data web. Our partners are universities and SME’s across Europe, with deep expertise in data quality assessment, linked open data, and other data-intensive technologies. Our collective expertise and technologies are integrated into the “DIACHRON platform”.

During the course of this project, Hanzo has made numerous significant advancements to our crawler. This has enhanced the ability to identify and capture data from diverse sources on the web, and if deployed behind the firewall, from many internal sources found inside an organisations intranet.

As a result, Hanzo provides data as a service to the DIACHRON platform for use in a number of pilot applications, including enterprise linked data and scientific open data among others. Through Hanzo’s API, the DIACHRON platform is able to control the data capture process based on various data assessment and linking technologies.

How is this helping our customers?

For a start, here are the benefits of our new features:

  • Crawler technology to extract complex data inherent in web content
  • On-demand crawling at small to extremely large scale
  • Crawling and data export APIs
  • Both on-premise and cloud deployments

This has enabled a number of novel uses of our services, for example:

  • Broad industry-level crawls, data extracted for patent analysis to assess competitor activities
  • Social media collections, data extracted from social profiles, posts, comments, and conversations to provide compelling legal evidence
  • Collect external data sources inherent in web content, extract relevant data and transform into structured datasets for linking with existing datasets in business intelligence and other applications.

Our data as a service offering is available to all our enterprise and legal customers.

The data web and data intensive applications and services are evolving, growing and maturing. With the rise of enterprise data infrastructures and the Internet of Things, Hanzo’s data as a service offering is timely. If you’re looking to collect external data to compliment your enterprise data sets, such as customer intelligence, IP and competitor analysis, etc., please contact us to find out how we can help you.

Related posts

There are no related posts

Archiving Internal Web Content

Archiving Internal Web Content

When archiving web content, we use different methods depending on the type of web content. Each client may require a ...

Read More >