Messy Metadata: More Challenges With Collecting Data From Google Workspace

| July 14 2021

Knowledge workers may never go back to the office as we once knew it. But, now that companies and their employees have learned how well working from home can work—both for maintaining productivity and workers' quality of life—remote work is unquestionably here to stay

For many offices, Google Workspace is one of the tools that enabled the transition to a fully remote workforce. Google provides outstanding version control, tremendous data storage capacity, and effortless collaboration on shared documents. 

Of course, the data that businesses generate on Google Workspace is potentially discoverable. For this reason, Google created a tool—Google Vault—that ostensibly helps organizations preserve and collect files relevant to litigation. But identifying, preserving, and collecting Google Workspace files using Google Vault presents several challenges.


Google Vault Limitations

I've written in more depth about some of these issues before. To recap, Google Vault can lead organizations to over-collect data because of the way it organizes and presents file information. Google Vault currently doesn't allow any visualization of a user's Google or Shared Drive structure, so there's no way to navigate specific files easily. Even then, users can't select individual files or folders to export—thus the necessity to export a custodian's entire Drive if they really only need a portion of it. There's also not a straightforward way to view specific versions of documents. Even though Google Workspace keeps track of every version a user creates, a user can only access those versions on a document-by-document version, which can be tedious and time-consuming. 

Perhaps most importantly, Google Vault's exports aren't review-platform-ready. The primary problem is that Google Vault uses an XML as the load file format. This can certainly be problematic as an import source. File names are also appended with a Google DocID, making it hard for users to figure out what the original file name was supposed to be. 

But let's take a closer look at a problem I haven't talked about much: metadata. 


How Google Vault Manages Metadata

Metadata is crucial for ediscovery, both for its management—identifying the correct version of files and rapidly searching for relevant information—and production integrity. For example, suppose a litigation opponent sees that you've altered metadata in the process of collecting it and producing it. In that case, they're likely to have some serious questions about what else you may have altered. 

Unfortunately, Google Vault isn't ideal for meeting the needs of ediscovery professionals. Three things are lacking with Google Vault's handling of metadata. 

As mentioned, Google Vault separates metadata from its underlying files, exporting the metadata via XML files and labeling the loose documents themselves with both the file name and the internal Google Doc ID reference number. Then, the user must reassemble those two separate files before a review platform can understand them, dramatically increasing the time and effort needed to prepare data for review. 

Second, Google Vault omits critical metadata upon export. It fully excises some types of metadata, including:

  • the full file path description, 
  • file version information, 
  • parent folder information, 
  • indications that a document has been deleted or moved, and
  • information about file sharing and access permissions.


But that's not all. For example, it also overwrites the original metadata about a document's creation date; instead, it assigns the creation date as the date of export. Since metadata is a critical search component for discovery—particularly metadata about dates—losing that information can be problematic.

Third, omitted date metadata makes it hard to identify the correct version of a document. While Google Drive maintains every version of a document that the user creates, finding those versions and using them in ediscovery is different. Without original metadata about the creation date for a file, it's virtually impossible to know that you're getting the correct version of a document, edited by the right person on the correct date. 

What you need is a way to export information out of Google Workspace—in a review-ready format—without losing or altering any metadata. That's where Hanzo Hold for Google Workspace comes in.


Watch the webinar, Take Advantage of Enhanced Metadata in Google Workspace on demand.

Hanzo's Enhanced Metadata

Hanzo has been working with collaboration data for years now. We've already created a purpose-built solution for Slack data, and now we've expanded that platform to collect discovery-ready data from Google Workspace. 

Hanzo Hold for Google Workspace corrects the metadata challenges of Google Vault by pulling metadata from three distinct sources:

  • Google Vault, 
  • the Google Drive API, and 
  • the Hanzo Index Engine.


In addition to all of the information we can glean from Google, our index provides the full-text content of the file in a searchable format so that you can identify every snippet of metadata associated with a file. Hanzo's exports are ready to import into the review platform of your choice with no additional processing—we do all the hard work of reassembling metadata load files with their source files, returning native files with their native names. 

We didn't just correct Google Vault's metadata missteps: Hanzo Hold for Google Workspace is a visual tool that allows users to easily navigate Google Drive and select only those files and folders relevant to their case. As a result, a user can now be confident they'll avoid the overcollection workflow of Google Vault and radically reduce the amount of data collected for a matter, lowering the cost and burden of ediscovery. 


Ready To Learn More?

Hanzo Hold for Google Workspace makes ediscovery easy, fast, and affordable? Contact us to set up a demonstration.


Related posts

Managing Change, Improving Adoption: How IT Can Better Support the Legal Department

Managing Change, Improving...

Lawyers have a reputation—sometimes deserved, sometimes not—of being technophobic Luddites. While there are certainly ...

Read More >
What Did We Learn in 2021, and What’s Next for Ediscovery in 2022?

What Did We Learn in 2021,...

As we’ve been wrapping up 2021 and looking forward to 2022—again hopeful that at some point this year we’ll see more of ...

Read More >
Webinar Recap: Results From the 2021 ACEDS and Hanzo Survey on the State of Collaboration Data and Corporate Readiness

Webinar Recap: Results From...

  How do companies use collaboration apps today—and how well are they incorporating the data into their ediscovery ...

Read More >

Get in Touch to Learn More

Hanzo’s purpose-built, best-in-class solutions can help your readiness to respond to the next discovery request, investigation, or audit. Contact us to learn more.

Contact Us