Messy Metadata: More Challenges With Collecting Data From Google Workspace

| July 14 2021

Knowledge workers may never go back to the office as we once knew it. But, now that companies and their employees have learned how well working from home can work—both for maintaining productivity and workers' quality of life—remote work is unquestionably here to stay

For many offices, Google Workspace is one of the tools that enabled the transition to a fully remote workforce. Google provides outstanding version control, tremendous data storage capacity, and effortless collaboration on shared documents. 

Of course, the data that businesses generate on Google Workspace is potentially discoverable. For this reason, Google created a tool—Google Vault—that ostensibly helps organizations preserve and collect files relevant to litigation. But identifying, preserving, and collecting Google Workspace files using Google Vault presents several challenges.


Google Vault Limitations

I've written in more depth about some of these issues before. To recap, Google Vault can lead organizations to over-collect data because of the way it organizes and presents file information. Google Vault currently doesn't allow any visualization of a user's Google or Shared Drive structure, so there's no way to navigate specific files easily. Even then, users can't select individual files or folders to export—thus the necessity to export a custodian's entire Drive if they really only need a portion of it. There's also not a straightforward way to view specific versions of documents. Even though Google Workspace keeps track of every version a user creates, a user can only access those versions on a document-by-document version, which can be tedious and time-consuming. 

Perhaps most importantly, Google Vault's exports aren't review-platform-ready. The primary problem is that Google Vault uses an XML as the load file format. This can certainly be problematic as an import source. File names are also appended with a Google DocID, making it hard for users to figure out what the original file name was supposed to be. 

But let's take a closer look at a problem I haven't talked about much: metadata. 


How Google Vault Manages Metadata

Metadata is crucial for ediscovery, both for its management—identifying the correct version of files and rapidly searching for relevant information—and production integrity. For example, suppose a litigation opponent sees that you've altered metadata in the process of collecting it and producing it. In that case, they're likely to have some serious questions about what else you may have altered. 

Unfortunately, Google Vault isn't ideal for meeting the needs of ediscovery professionals. Three things are lacking with Google Vault's handling of metadata. 

As mentioned, Google Vault separates metadata from its underlying files, exporting the metadata via XML files and labeling the loose documents themselves with both the file name and the internal Google Doc ID reference number. Then, the user must reassemble those two separate files before a review platform can understand them, dramatically increasing the time and effort needed to prepare data for review. 

Second, Google Vault omits critical metadata upon export. It fully excises some types of metadata, including:

  • the full file path description, 
  • file version information, 
  • parent folder information, 
  • indications that a document has been deleted or moved, and
  • information about file sharing and access permissions.


But that's not all. For example, it also overwrites the original metadata about a document's creation date; instead, it assigns the creation date as the date of export. Since metadata is a critical search component for discovery—particularly metadata about dates—losing that information can be problematic.

Third, omitted date metadata makes it hard to identify the correct version of a document. While Google Drive maintains every version of a document that the user creates, finding those versions and using them in ediscovery is different. Without original metadata about the creation date for a file, it's virtually impossible to know that you're getting the correct version of a document, edited by the right person on the correct date. 

What you need is a way to export information out of Google Workspace—in a review-ready format—without losing or altering any metadata. That's where Hanzo Hold for Google Workspace comes in.


Watch the webinar, Take Advantage of Enhanced Metadata in Google Workspace on demand.

Hanzo's Enhanced Metadata

Hanzo has been working with collaboration data for years now. We've already created a purpose-built solution for Slack data, and now we've expanded that platform to collect discovery-ready data from Google Workspace. 

Hanzo Hold for Google Workspace corrects the metadata challenges of Google Vault by pulling metadata from three distinct sources:

  • Google Vault, 
  • the Google Drive API, and 
  • the Hanzo Index Engine.


In addition to all of the information we can glean from Google, our index provides the full-text content of the file in a searchable format so that you can identify every snippet of metadata associated with a file. Hanzo's exports are ready to import into the review platform of your choice with no additional processing—we do all the hard work of reassembling metadata load files with their source files, returning native files with their native names. 

We didn't just correct Google Vault's metadata missteps: Hanzo Hold for Google Workspace is a visual tool that allows users to easily navigate Google Drive and select only those files and folders relevant to their case. As a result, a user can now be confident they'll avoid the overcollection workflow of Google Vault and radically reduce the amount of data collected for a matter, lowering the cost and burden of ediscovery. 


Ready To Learn More?

Hanzo Hold for Google Workspace makes ediscovery easy, fast, and affordable? Contact us to set up a demonstration.


Related posts

Knowledge is Power: How Legal Operations Can Create Efficiency Through Intelligence

Knowledge is Power: How Legal...

Legal departments are facing higher competition and budget limitations, prompting them to seek ways to improve their ...

Read More >
Operational Excellence Through Management of Corporate Legal Departments

Operational Excellence...

The legal department of an organization is responsible for providing crucial legal support and advice to the company's ...

Read More >
Ediscovery Best Practices for Slack and MS Teams from Information Governance Through Litigation

Ediscovery Best Practices for...

Workplace collaboration tools like Slack and MS Teams have become ubiquitous in many organizations. However, they also ...

Read More >

Get in Touch to Learn More

Hanzo’s purpose-built, best-in-class solutions can help your readiness to respond to the next discovery request, investigation, or audit. Contact us to learn more.

Contact Us