Connecting to an organization's various enterprise data sources is a primary goal for any ediscovery solution. After all, how can you identify, preserve, and analyze Electronically Stored Information (ESI) if you can’t connect to it?
When evaluating an ediscovery solution, one of the first things you might look at is how it might connect with your organization's data. One of the most common answers will be, “Oh there’s an API” or Application Programming Interface. APIs share information with an outside program while keeping internal details of its system hidden.
But just because an API is available, doesn’t mean it may be suited for the task at hand. Some APIs are limited in the data and metadata they capture. So even if there is an API and a solution can connect to it, doesn’t mean you’ll get what you need.
In other cases, there may be more than one API available for a particular data source, where one may be more effective for a particular use case than another. A good example of this is with Google Drive.
APIs For Google Drive
Google Drive has more than one API which can give 3rd party software access: Drive API and Google Vault.
Google Vault was specifically designed for data retention and eDiscovery tasks, allowing users to set retention policies, as well as preserve, search, and export your organization’s data while keeping an audit trail.
Drive API Limitations vs. Vault API
So does it really make much of a difference how your organization’s ediscovery solution connects to Google Drive? After all, isn’t the connection the main hurdle to overcome?
To answer this, I spoke with Dave Ruel, Head of Product at Hanzo.
“A big challenge when collecting files via Drive API is that the process is subject to enterprise-wide Google rate limits. In other words, collecting documents falls within the same rate limits as every company user who is uploading, downloading, opening, editing, moving, and deleting files within the system. Additionally, if there are any wide-scale data migrations, backups, etc, the system speed may be impacted when collecting information with an increased chance of skipping files altogether. What’s worse, there is little to no audit trail on the collection itself.”
Besides the rate limits, which could potentially impact an organization’s day-to-day operations should data collection be required for ediscovery, there are also limitations around legal holds.
“From an ediscovery standpoint,” Ruel continues, “Drive API does not take into consideration data that is on a legal hold. So if a document placed on legal hold is deleted by the user, it would not be available via Drive API; however, it would still be available via Google Vault collection. From a legal standpoint, this is a critical point to remember during collection.”
Along with these challenges, Drive API doesn’t have a built-in audit trail or knowledge of originating and final document counts, both of which are also important for ediscovery.
So this means an ediscovery tool that relies on Drive API probably isn’t using the optimal approach.
Advantages of Using Vault API
The biggest advantage of using the Google Vault API over the Drive API is that Vault is purpose-built for ediscovery and data loss prevention (DLP), so it’s set up to provide ediscovery solutions with the information they need most. Using the Google Vault API is also Google’s preferred method for collecting legally defensible information from their workspace.
Some other advantages of using Google Vault API for ediscovery:
- Google Vault Includes an audit log
- Google Vault orchestrates legal holds for Gmail, Gdrive, Chat, and Group custodians
- Google Vault API gives the ability to collect all legal-hold data, including deleted user content (not available via Drive API)
- Google Vault API delivers easily consumable user-friendly Gmail/GDrive data formats
At first glance, the API your ediscovery solution is using to connect to your enterprise Google workspace data may not be high on your list of priorities. But it’s important to remember that not all APIs are created equal and you want to use the connection which is optimal for the job at hand. For ediscovery, that means picking a tool that utilizes the Google Vault API over the Drive API.
Want to learn more about APIs affect data preservation for ediscovery? Download The Guide to Modern SaaS Preservation!