Collecting data from your own website: how hard can it really be? I mean, can’t anyone take a screenshot or rely on the Wayback Machine’s archives?
Sure they can—but those methods won’t hold up in court. For proof, let’s look at a case from last December involving viral news website BuzzFeed.
What Happened in the Defamation Case of
Leidig v. BuzzFeed
As it often does, BuzzFeed wrote a controversially worded title, “The King of Bullsh*t News,” in an effort to get users to click through to its article. That article focused on some pretty out-there news stories that Michael Leidig published on various websites for the Central European News, Ltd. (CEN). The CEN stories included claims that lonely Chinese people were “walking cabbages like pets” and that a Russian woman inadvertently killed her kitten by dying it pink.
BuzzFeed reached out to Leidig for comment before it published its article. His lawyer responded that the article would be “highly defamatory” and that Leidig and CEN were reserving “all their legal rights.”
About eight months after the BuzzFeed article, Leidig filed a lawsuit against BuzzFeed, claiming defamation. Leidig v. BuzzFeed, No. 16 Civ. 542 (VM) (GWG) (S.D.N.Y. Dec. 19, 2017).
During discovery, Leidig said he had “taken down” the websites where CEN originally published the disputed stories. Leidig explained that he had screenshots of the former websites that he intended to use as evidence. Indeed, he later produced “documents bearing no metadata, including manually manipulated PDFs … and screenshots,” but he did not produce “any preserved copies” of the websites themselves.
When BuzzFeed again requested preserved versions of the old CEN websites, Leidig said that the preserved websites were available through the Internet Archive’s Wayback Machine. Leidig explained in a deposition that he and CEN made “no special effort” to preserve evidence before filing their lawsuit. A few weeks after the BuzzFeed article came out on April 24, 2015, Leidig disabled the “vast majority” of CEN’s websites. Leidig did not file his complaint until January 25, 2016.
The court determined that Leidig and CEN had allowed potentially relevant information—the CEN websites—to be lost or spoliated after they had a duty to preserve that information for discovery. In fact, as the plaintiff here, Leidig obviously anticipated litigation long before he initiated his case. Therefore, he was on notice of the likelihood of litigation, and under a duty to preserve evidence, beginning no later than April 24, 2015.
Under the Federal Rules of Civil Procedure, that duty extended to any documents or electronic evidence that Leidig knew or should have known could be relevant to future litigation. The websites clearly fell under this umbrella, as they were the very basis of the allegedly defamatory BuzzFeed article. Further, those websites would be relevant to any determination of whether Leidig is a “public figure” subject to a heightened defamation standard.
Needless to say, Leidig did not properly preserve the websites or take any “reasonable steps” to do so. BuzzFeed urged the court to find that Leidig’s failure to act was done with the intent to deprive BuzzFeed of evidence. The court didn’t go quite that far; it noted that Leidig acted intentionally to destroy the website evidence but not necessarily to deprive BuzzFeed of its use. Rather, Leidig’s “amateurish collection of documents,” which led to “the destruction of perhaps critical metadata,” demonstrated his failure to take reasonable steps to preserve information for litigation.
Leidig argued that any failure to preserve websites could not have caused prejudice to BuzzFeed because backup versions were available through the Wayback Machine’s archive. The court dismissed this contention out of hand. It found that Leidig offered “no evidence or argument showing that the data generated from the [Internet Archive] website is reliable, complete, and admissible in court.” Therefore, its availability did nothing to mitigate the prejudice BuzzFeed suffered by the destruction of the original CEN websites.
By way of corrective sanctions, the court allowed that BuzzFeed could demonstrate to the jury that Leidig had disabled the CEN websites after threatening litigation and while he was under a duty to preserve them. The court also granted BuzzFeed the option to use the Wayback Machine archives with the presumption that they are “authentic and accurate”—though that presumption would apply only if BuzzFeed, not Leidig, used the archives.
What Was Wrong With Leidig’s Self-Capture?
For starters, it’s a fair assumption that Leidig didn’t try very hard to preserve any website data. A jury might well conclude that his lackadaisical effort reflected the unhelpfulness of that website information to establish his claims.
But even though Leidig’s effort was minimal, he did take screenshots of the CEN websites. Why didn’t that work to prove what the websites said? There were two main reasons: because Leidig showed himself to be untrustworthy, and because preserving websites is harder than it looks.
By providing “manually manipulated PDFs” in the original discovery production, Leidig demonstrated that he was willing and able to change evidence. In doing so, he revealed one of the major weaknesses of screenshots for proving a website’s content: screenshots can be photoshopped just like any other picture can be.
Don’t like those extra five pounds around your waistline, or the gray showing in your hair? Photoshop out those imperfections. Don’t like that your website displays a fake-looking picture with an inflammatory headline? Clean it up so it looks more like a reputable journalistic source. This is why courts reject screenshots: it is entirely too easy to modify pictures to make them show anything you want. While there was no proof here that Leidig had manipulated the website screenshots, the court had no reason to find those images reliable—so it didn’t.
Additionally, it’s just not that easy for a layperson to preserve websites in a forensically sound, admissible format. When screenshots are used, they should be date-stamped and hashed, and even then they should be no more than a secondary preservation method. Any evidence that might be used in court must pass the tests of authenticity, reliability, and admissibility. Leidig’s images failed all of those tests, as did the Wayback Machine’s archive.
A Better Alternative: Forensic-Quality Native Format Web Archiving
So you know not to fall for Leidig’s mistake—but what’s the solution?
If website information is central to your claim or defense—or to your opponent’s—make sure you preserve it in a way that the court will accept. Native format web archiving, especially when performed by an impartial third-party expert who can testify in court about the methods used and their reliability, allows users to navigate through preserved websites, seeing everything that was in the original. With a fully archived website, you can search for specific information, click on links to investigate more deeply, and interact with any personalized website features.
Forensic-quality web preservation carries assurances that the information you see is the same information that was originally displayed online, even if there were multiple versions of the original website. Nor is your preserved version at the mercy of an intermittent backup provider like the Internet Archive.
Nobody wants to be chastised by a court for “amateurish” data collection. Work with the experts at Hanzo, and you’ll be able to capture confidence instead.