In the hustle and bustle of ediscovery planning, we often focus more on the content of discoverable information than we do on its form. For example, in a hostile-workplace claim, you may know that you want all of the employer’s records and communications that relate to the employee. But does it matter whether you get those records as PDF files, TIFF image files, or in the native format that they were originally created?
It might. Sure, for some files and some file types, native-format files may not be critical. For small cases with limited data concerning a narrowly defined dispute with an above-board opponent, all of discovery may be straightforward and uncomplicated.
But then there’s every other case: those involving more parties or more money, greater data volumes, broader issues, or zealous litigants whose conduct is worthy of at least, shall we say, a more cautious approach. For those cases, native-format files with intact, unchanged metadata may become critical to proving an element of the case or to successfully arguing that evidence has been spoliated. And if you didn’t realize that native-format files were important until after you received your opponent’s full production, you’re almost certainly too late.
That means you—and your technical support team—need to consider the form of production from the beginning of every case. It’s such a good idea that it’s included in the rules governing discovery.
Forms of Production: The Rules
The Federal Rules of Civil Procedure offer several points of guidance about the form of production for electronically stored information.
First, Rule 26(f) calls for attorneys to “confer as soon as practicable” concerning any “issues about disclosure or discovery of electronically stored information, including the form or forms in which it should be produced.” That means that you need to know, from the outset of discovery discussions, not only what relevant information and data you have but also the sources and formats of each of those types of information.
Note that being prepared to capably and competently negotiate at a Rule 26(f) conference requires having a firm understanding of more than just the legal issues in your case. You also need a technical grasp on your data as well as the data you hope to receive from your opponent. How is your data stored? How is it structured? Is it accessible? What metadata is associated with these file types, and what does that metadata mean for your arguments? This in-depth understanding necessitates early involvement from your IT department, which can help you understand what’s important about the form of production and the form of receipt.
When asking for information from an opponent, Rule 34(b) allows the requesting party to specify a desired form or forms of production. If you don’t, the default under Rule 34(b)(1)(E)(ii) calls for the responding party to “produce it in a form or forms in which it is ordinarily maintained or in a reasonably usable form or forms.” The rules don’t require native-format files, nor do they precisely define what native format means.
So if the rules don’t demand native format, does it really matter? Let’s consider the other forms of production and weigh their advantages and disadvantages.
Possible Forms of Production
Generally, discovery involves four broad forms or categories of production: paper, quasi-paper, quasi-native, and native.
Paper is self-explanatory, at least for those over the age of 30! With very small productions or those involving significant handwritten notes, paper may be fine. Paper doesn’t show certain information, though, such as formulas within a spreadsheet or comments on an edited document. Some file types may also be awkward and unmanageable on a standard printed page. Databases, for example, generally don’t make any sense and certainly aren’t functional in a printed view.
Unfortunately, the next level up, quasi-paper production, is a very common form of production for ediscovery despite being little better than paper. Both TIFF files and PDFs are quasi-paper forms: these are images of a printed or paper production that may or may not include searchable extracted text. On the plus side, it’s easy to redact and Bates number paper and quasi-paper productions. They’re also generally impervious to editing and modification.
However, both TIFFs and PDFs require time and expense to image, and both are prone to errors of translation during text extraction. And quasi-paper production suffers from the same problems as paper production when it comes to hidden information and page-based formatting. This may not be a concern with certain file types such as text documents, but it renders spreadsheets, presentations, and online content essentially useless. For one mathematically simple example, a spreadsheet cell showing “2” might reflect “2 times 1,” “4 divided by 2,” “1 plus 1,” “4 minus 2,” “the square root of 4,” or any of thousands of other possible formulas.
Quasi-native production is a considerable improvement in terms of the functionality of data. In a quasi-native production, parties exchange usable electronic information, but in a different file type than it was originally created. This is especially useful for databases and proprietary file formats where the recipient may not have the hardware or software required to access the original file type.
Finally, native-format production is, perhaps obviously, production of electronically stored information in the format in which it was originally created. For Microsoft Word documents, this would be .doc or .docx files; for Excel spreadsheets, this is .xls or .xlsx; for online content, WARC or Web ARChive files are the gold standard for native review. (Note that for emails, which may be created through several different programs, native format is a more complex question.) While it may be difficult or impossible to redact or Bates number individual pages in a native production, the files themselves can be Bates numbered or, better, hashed for individual identification. Native-format production requires no additional time, effort, or money to produce, as files don’t have to be converted, translated, or imaged.
Native production should always include complete and unchanged metadata. This metadata describes the underlying data itself, showing information such as its file creation date, custodian and author, and last modified date and author. For emails, metadata also shows transmission information like the date the email was sent. Metadata is especially useful for quickly filtering potentially discoverable information for relevance, as it allows rapid sorting by date and custodian.
But do you really need to receive native-format data? And should you be producing your own discovery responses in native format? For most data types, there’s no right answer to these questions that fits every situation. But there is one type of electronically stored information that should always be produced in native format: online content.
Why Native Format Is the Only Answer for Web Content
If your case involves online content such as a company website, social media posts, or online collaboration applications, native-format WARC production is the only sufficient form of production. But why?
Most websites are dynamic: a single static image of the site simply will not show you all of the relevant information you need. Even something as straightforward as a navigation menu isn’t visible without mousing over the menu bar. And today’s websites incorporate far more interactive content than just mouse-over menus. Entire blocks of text are likely to be hidden until the user interacts with the page. Fill-in forms and calculators cannot be manipulated on a piece of paper or in a static image. Obviously, videos are unviewable in PDF, TIFF, or other screen-capture formats. Nor can you click on links or explore associated content with a static capture method.
Bear in mind, as we noted above, that you must be prepared to negotiate the form of production from the outset of discovery discussions. The recent case of Baker v. Santa Clara University, No. 17-cv-02213-EJD (VKD) (N.D. Cal. Jul. 31, 2018), illustrated why early discussions are critical. In that case, the plaintiff requested production in native format “including electronically stored information, metadata, and all metadata fields.” Despite that request, the defendant produced over 2,500 pages of documents in PDF format without metadata. The plaintiff, naturally, objected and moved the court to compel a native-format production. The defendant argued in response that it had tried to negotiate the terms of production in Rule 26(f) conferences but that it was unable to do so due to the lack of meaningful cooperation from the plaintiff’s counsel.
The court concluded that both parties violated the rules. However, since the plaintiff did “not have a compelling reason” and offered no “specific, articulable basis” for the court to believe she was missing any relevant evidence, the court denied her request for a second production in the files’ native formats.
Knowing what you need from the beginning is the key to getting a production that is actually useful and helpful in your case. When online information is critical to litigation, the only right answer is native-format preservation and production. If you’re struggling to request or produce native WARC files, Hanzo can help.