Analysis of SaaS API Limitations for Ediscovery and Compliance

| July 19 2022

When it comes to ediscovery and compliance, APIs can give users the ability to use 3rd party solutions to preserve, collect, and even cull data housed in a SaaS application; however, functionality is still limited to what the API is built to communicate. 

Even if an application has an API available, its design is often focused on the modification of objects and data rather than the creation of a consumable, universal format expected by legal and compliance teams. 

It may also only capture text-based information and may lack some of the visual and dynamic features of the original interface, which can be quite important when it comes to understanding and contextualizing the data.

Export Functionality

One particular limitation of APIs is the available export format. Some APIs may not have an export function at all, while others usually export data in JSON format (all but 21% of APIs surveyed in Hanzo's 2022 Modern Guide to SaaS Preservation used JSON export).

JSON uses human-readable text to store and transmit data objects consisting of attribute-value pairs and arrays (or other serializable values). And while most of the required data is preserved in this format, it’s not the most usable when reviewing for potential relevance in an investigation, especially for platforms with complex user interfaces or communications channels.

The data held within a modern SaaS application can be a combination of text, images, figures, graphs, emoji, attachments, and even the User Interface. A JSON export loses the visually dynamic nature of SaaS platforms. This loss can mean the difference between demonstrating regulatory compliance or accurately meeting discovery requests because, more and more, the user interface is an integral part of the data.

Availability of Data & Data Quality

Two types of data search through an API are Object Search and Content Search.

Object Search is available on most, if not all, APIs and allows search by fields, field type, user, table, etc. However, to do so, object searches can require advanced search syntaxes such as SQL or similar application-specific languages. All things considered, object search typically functions more as a filter rather than a selectable search query, meaning you may have to know upfront for each data source how the data is structured by filterable fields, such as username, email, file type, channel, inbox, document ID, author, date modified, MD5, and more.

Content Search, on the other hand, is only partially available, with only 39% of those surveyed having it. This native search is highly limited by field type, table structure, etc. and most data remains unindexed, so there’s no method for knowing the deeper contents of the data beyond the predetermined fields offered through the API. Unless you’re doing very targeted collections and are absolutely certain of a document’s relevance to the matter at hand, it’s nearly impossible to use for homing in on relevant information without first archiving all of the available data.

Complex User Interface

The complex user interface in today’s SaaS applications is just as discoverable during a legal matter as the extracted text. Which is why capturing all of the data can be difficult using only an API.

Here are some of the general challenges one might face when trying to collect data from complex SaaS applications.

CRM (Customer Relationship Management)

Examples: Salesforce; Hubspot

Challenges When Using API:

  • Many pre-determined and customizable fields to capture from a complex UI for each screen
  • Difficult to know which table is being requested from API
  • UI complexity becomes part of the relevant data collection
  • Reviewers need to interact with the system as users for full context and understanding of data

HRIS (Human Resources Information System)

Examples: Bamboo HR; Workday

Challenges When Using API:

  • Dense initial landing pages containing information unrelated to the collection target provided through the API
  • Multiple layers of menu items could create many artifacts (page text, PDF, DOC, JPEG, etc) via the API, which would be difficult to parse by reviewers
  • Reports and other aggregated data (e.g. performance reviews, resume reviews, and comments made about reviews) could be challenging to capture as it depends on how the application and associated API are structured, which varies from enterprise to enterprise.

Project Management

Example 1: Asana

Challenges When Using API: 

  • Project management data creates a special use case for eDiscovery and internal investigations, because it creates a multi-faceted record of timelines and deliverables
  • The UI is highly complex, with different views, and multiple user contributions, creating large amounts of data and metadata
  • Reporting aggregates data available to a user

Example 2: Miro

Challenges When Using API:

  • Deep interface complexity based on the level of page zoom, which an API will never be able to approximate
  • PDF export is dependent on the frame structure and is not available via the API
  • API export is text only via hard-to-review JSON format
  • Integrations with other applications (e.g. Jira, Asana, Trello, Slack, Teams, etc) with updates in real-time, create an even more complex dynamic data challenge

Ticketing

Example: JIRA

Challenges When Using API: 

  • Board views offer high complexity which, like project management applications, is hard to replicate via API
  • Ticket view has many fields that are considered metadata which may not be available through API
  • Comments are often an important source of data but are exported as a separate object, leading to a loss of context

Conclusion

As organizations add more and more SaaS applications to their enterprise toolkits, legal departments should begin developing processes and updating technology to preserve and collect all of the data created from these complex sources. Standard export formats and APIs often don't sufficiently capture all the relevant data from dynamic UI, and screenshots aren't a scalable or defensible solution over time. Being prepared, instead of waiting until litigation or compliance audits arise, is the first step in mitigating risk and cost.

Want to learn more about APIs affect data preservation for ediscovery? Download The Guide to Modern SaaS Preservation!

DOWNLOAD THE GUIDE

 

Related posts

Thanksgiving Ode to Spotlight AI

Thanksgiving Ode to Spotlight...

Recently, the team at Hanzo announced Spotlight AI, the first automated relevance assessment engine for eDiscovery. ...

Read More >
Navigating Out of the AI Pilot Purgatory: A Roadmap to Success

Navigating Out of the AI...

In the ever-evolving world of AI, many organizations find themselves trapped in what several affectionately term the ...

Read More >
Understanding California Senate Bill 235: A New Chapter in Civil Discovery

Understanding California...

When it comes to the legal landscape, every new law or amendment can have a significant impact on the processes and ...

Read More >

Get in Touch to Learn More

Hanzo’s purpose-built, best-in-class solutions can help your readiness to respond to the next discovery request, investigation, or audit. Contact us to learn more.

Contact Us