Ediscovery is an unforgiving pursuit. Err on the side of collecting too little data, and you may find that you don’t have the helpful evidence you need (or that your opponent is rightfully demanding). On the other hand, if you err the other way by collecting too much data, you’ll soon realize that you’re paying to process and host volumes of unnecessary and unhelpful data.
What you need is the Goldilocks amount of data—the right amount to prove your claim or defend against your opponent’s without any wasteful overspending. For traditional data sources like email, ediscovery vendors have helped to address this conundrum by creating deduplication technology. But how do you apply that concept to Slack data?
Here’s how Hanzo uses a “collect once, use many” approach to avoid costly overcollection of Slack data.
The Need: Right-Sized Slack Ediscovery Collections
The very nature of Slack data creates a challenge in collecting the right amount of data for ediscovery. Whereas an email is a self-contained packet of information, a Slack message is unlikely to contain an entire thought, much less its supporting rationale. You could say that an email is like a soliloquy or a monologue in a screenplay, while a single Slack message is like one line of dialogue in an ongoing conversation. The soliloquy could provide you with deep insight into a character’s actions and motivations; whereas a Slack message, standing alone, might say only “yes.” Without the context of the surrounding conversation, Slack messages are very often meaningless.
What that means when collecting Slack data for ediscovery or investigation is that it isn’t enough to collect just the messages that a custodian of interest has posted. Rather, you need to collect the entire surrounding conversation thread to put those messages in context. That means collecting not just the messages the custodian authored but all of the content from direct messages and channels that the custodian participated in during a particular time range.
If you were only concerned with one custodian, this might not be too burdensome, but it’s a rare matter that involves only one custodian. In the far more common scenario, you’ll have numerous custodians, and you’ll need to collect each of the conversation threads they participated in. However, custodians are often members of the same team channels or direct messages. If they’ve all participated in the same conversations—which is likely, if they’re all related to the same investigation or ediscovery matter—you’ll end up collecting those same threads over and over again, unnecessarily boosting the amount of data you have to manage and pay for.
The Approach: Collect Slack Data Once by Deduplicating Along the Way
Duplication is, of course, not a new problem in ediscovery. Emails sent to a group of recipients, for example, might all appear in different mailboxes of your custodians. That’s why deduplication—the recognition of duplicate emails and their identification during processing—is a standard approach for email. Unfortunately, most vendors haven’t come up with a way to manage the deduplication of Slack data.
Hanzo has. We know that mitigating the cost of ediscovery is a serious concern for our customers, and we always strive to target collections as tightly as possible. Additionally, we know that the reuse of data helps organizations limit wasteful, duplicative data stores that can introduce additional risk and cost. That is why we’ve incorporated a “collect once, use many” approach to deduplicate Slack data during the collection. This unique approach to collecting Slack data ensures significant cost savings over traditional collection strategies and proactively helps organizations to limit risk by governing data retention.
How does it work? Our purpose-built application, Hanzo Hold for Slack, automatically checks its preservation repository during collection. It recognizes any data that it has already collected and only adds unique conversations, effectively deduplicating redundant data before it is collected.
Our “collect once, use many” approach applies not only across all of the custodians you might add within any given matter but also across different matters. Once Hanzo Hold for Slack has collected and retained a message or file, it retains that content for as long as it is relevant to one or more active matters, without the need to collect it again. Equally importantly, when a message or file is no longer needed for any active matters, it is automatically purged from the preservation repository to avoid unnecessary cost and exposure.
The Proof: Test Out Our Slack Ediscovery Cost Calculator
What does this mean for our customers? On average, they report that there’s a greater than 60 percent overlap between related custodians and matters in Slack. That means they’re collecting at least 60 percent more data than necessary through the Slack API, processing at least 60 percent more data than necessary, and hosting at least 60 percent more data over the life of the matter. And importantly, this cost compounds with each new matter.
By targeting collection of Slack data—only collecting and processing unique content required for identified custodians or channels—Hanzo Hold for Slack saves our users an average of 62 percent on processing and hosting costs. Additionally, Hanzo’s effective search and case assessment capabilities allow users to quickly find the relevant content within their Slack application and understand the case's merits. When you combine that with precise data exports in review-ready formats, Hanzo Hold creates substantial cost savings and risk mitigation across the entire EDRM.
How much could you save by collecting Slack data just once? Let’s assume an average of 10 matters per year with ten custodians per matter, each of whom has an average volume of 15 gigabytes in their Slack application. Using the traditional collection approach, you’d be collecting 1,500 gigabytes of Slack data per year. At $15 per gigabyte to process and $5 per gigabyte per month to host, you’re looking at an annual cost of $112,500 just to manage ediscovery of Slack data.
With Hanzo’s “collect once, use many” capability, you reduce your collection volume by the amount of data that typically overlaps between custodians. Using the average we’ve found—62 percent—you’d only collect 570 gigabytes and only pay $42,750 for processing and hosting—a savings of $69,750.
Want to see how much you could save by collecting Slack data just once?