Idealistic musings about eDiscovery
Category Archives: Metadata
Ralph Losey has written a 10,000-word essay on the best practice (note the singular use of the word) for culling irrelevant documents from a deduplicated, de-NISTed data corpus. Ten thousand words is a lot. Dare I suggest, you need to read every one of those words (particularly starting about a fourth of the way through the essay) multiple times, if you want to conduct a large electronic document review in which you:
- save time
- save money
- quickly eliminate files that can’t contain relevant information
- FIND THE KEY DOCUMENTS
Ralph provides far too much information to be digested on first read. The sheer volume of good advice (backed up by data) can appear overwhelming. I strongly recommend that you give it a try anyway, because even when trying to drink from a fire hose, you’re going to absorb at least some of the water.
There is a lot to think about and comment upon in Ralph’s post; but if you’ve been looking for a one-stop primer on how to manage and conduct an efficient ESI review, his essay is worth two or three reads. (Don’t overlook the hyperlink in the first paragraph to download the whole thing in PDF format.)
Metadata, frequently referred to as “data about the data” and specifically referring to electronically stored information (ESI), poses an interesting ethical dilemma for lawyers on two fronts:
- Should metadata be purged, or “scrubbed,” from an electronic file before it is produced to opposing counsel; and
- Where metadata has not been scrubbed, is it fair game for opposing counsel to attempt to “mine” the metadata for clues to potentially privileged information that the producing party might have missed?
To Scrub or Not to Scrub?
Metadata typically consists of information regarding the creation of and changes to an electronic document. Some metadata, such as the software used to create the file, and who created the file and when, tends to be relatively benign. Less benign may be metadata containing a record of changes made to the content and comments typed in during the revision process, as well as who made the changes and comments. A Microsoft Word document, for example, may contain a complete version history of the document if the “Track Changes” setting had been ticked in the application.
Needless to say, if counsel is not aware that old versions of the content are being tracked and stored, it would be very easy to miss this metadata during privilege review. For example, an attorney may inadvertently reveal her client’s bottom line on a contract negotiation by failing to erase comments on a draft to another party.
Is it ethical for counsel to scrub metadata before sending ESI to opposing counsel in discovery? On the one hand, changing (or deleting) metadata within an electronic file is tantamount to changing the file itself, and therefore amounts to intentional spoliation of the data. On the other, many attorneys have for years tried to avoid the review of metadata by imaging all electronic files into paper or TIFF format – thereby stripping all metadata except for the visible content – and then including an electronic “load file” containing limited metadata so that the receiving counsel can load the files and metadata into their own document review platform.
Much of this practice of selective scrubbing should have come to a stop when the Federal Rules of Civil Procedure (FRCP) were amended in December 2006. FRCP 34(b) requires the parties, in the absence of an agreement to the contrary among counsel, to produce ESI as normally kept by the party in the usual course of business. This typically requires production of the native ESI files, along with their attendant metadata. Similarly, Texas Rule of Civil Procedure 196.4 requires the requesting party to specify the form of production it seeks, and the responding party must produce responsive ESI that “is reasonably available to the responding party in its ordinary course of business.”
While there’s certainly nothing unethical about redacting and logging hidden metadata that may be subject to privilege, a lawyer arguably fails in her ethical duty of competence if her “technophobia” results in privileged metadata ending up in the hands of her more technogically-adept opponent.
Mining the Metadata
The magnitude of such a failure expands because the jurisdictions cannot agree on whether “mining”, or examining, the metadata in received ESI productions is ethical. A technologically proficient user who knows what she is looking for and how to find it can unearth a potential treasure trove of useful information within the metadata. For example, the metadata might contain a list of people who collaborated on a document and the date on which it was created and sent. This metadata might then be used to impeach a witness who testifies that he was the sole author of the document, who created and sent it on a different date.
The Texas Supreme Court Professionalism Committee has not yet spoken to this issue. The national and state bars that have cannot agree as to whether metadata mining is ethical or unethical, or even whether a bright-line test is appropriate.
(Note: This post contains text left over from a Texas-specific article I recently wrote. Needless to say, please do the research for the applicable rules in your jurisdiction.)
From Law.com (free registration required), Sheri Qualters reports that yesterday, Chief Judge Randall Rader of the U.S. Court of Appeals for the Federal Circuit unveiled a model order that would limit eDiscovery in patent cases. As discussed in the article, the model order is interesting not for demanding that attorneys become eDiscovery-savvy, but in throttling back the volume of eDiscovery that is, in fact, discoverable.
Three of the more interesting requirements of the model order:
- Metadata will be from e-discovery production requests without “a showing of good cause.” [This alone seems odd, as it flies in the face of emerging case law and, arguably, the “usual course of business” production requirements of the FRCP.]
- E-mail requests will be limited to five so-called custodians per producing party and five search terms per custodian. [Is this a realistic limit?]
- The production of electronic information en masse, or the inadvertent release of privileged or work-product protected electronic data, is not a waiver or permission to use it. [This one, on the other hand, I can comfortably get behind, and hope it gets adopted more globally.]
Keep in mind that this model order is (a) limited to patent cases and (b) limited to the Federal Circuit, at least for now. A member of the advisory committee that drafted the model order noted it’s unlikely that all federal courts will embrace it.
I question the reasonability of the first two bullet points noted above. Yes, they would certainly cut down on eDiscovery expense, difficulty and ambiguity … but are they practical? Do these rules really define the tipping point between ESI cost management and “the swift and fair adjudication of justice”? Or are they arbitrary limitations that may cut down on discovery disputes, but at the expense of full disclosure?
I also question the wisdom of limiting search terms to five per custodian. How will this be calculated – as an aggregate count of search terms across all custodians, or would the same terms applied to each of the five custodians max out the order’s limit? Also left undefined is the term “search term” itself, which could theoretically include individual keywords (stemmed and unstemmed), Boolean queries, concept searches, fuzzy and wildcard searches … the list goes on.
Hey, at least someone’s trying to throttle back eDiscovery excesses. But, based on this article, we still have a very long way to go.
“Shouldn’t we be aghast that firms still deal with tracked changes and comments in Word documents by simply wishing them away?” Craig wrote. A couple of paragraphs later, he continued:
A draft of a contract with tracked changes is a record of the document’s development incorporating the drafter’s communication to him or herself in the form of notes, highlighting and the like. If shared collaboratively using tracked changes and comments, the proposed edits are communications.
Craig further argues that, rather than review the metadata for privilege, attorneys merely tend to strip it without review. I encourage you to read his post, but I must provide a counter-argument.
There are a lot of reasons to provide most metadata associated with an electronic document, most of them forensic: date and time stamps, whose computer saved the final version, which custodian’s “fingerprints” were on it, and so on. But unless I’ve read Craig’s post too quickly, he seems to be lumping this forensic metadata together with version tracking metadata, discussing the two types as if they are the same. In my opinion, such is not the case.
In the days when I wrote (or typed, or even word-processed) my document drafts to paper and then passed the drafts to our co-workers for collaboration, they would mark up their edits and make notes in the margins. When I would get the marked-up drafts back, I would incorporate them into a new, final version of the document, and the marked-up copies would hit the shredder. Certainly, I had no expectation that the initial drafts might have evidentiary value in litigation later down the road, just as creators of electronic documents today don’t typically work on a draft with the ugly specter of future litigation perched on their shoulder.
Had my final draft become responsive to a litigation discovery request back then, would I somehow have been in trouble because I had failed to keep the interim drafts? Only if my company’s document retention policy had required me to keep them. Otherwise, the final version stood on its own. It “spoke for itself.” (Res ipsa loquitur, don’tcha know.) Should modern e-files be treated differently?
Before you answer, “But Gary, different technology makes for different requirements,” let me make one significant point: In Microsoft Word, the “Track Changes” function can be disabled. Even after several versions’ worth of changes have been made, “Track Changes” can be turned off, and the version history information can be purged, long before a document reaches the final version – if and only if the user knows that this feature exists and how to turn it off!
I’m not familiar with any judicial mandate that requires Word users to leave “Track Changes” on while they are working on drafts of corporate documents. To me this means that, if we adopt the argument that such metadata should always be produced if available, are we not subjecting some users to a higher standard of production because they don’t have the technical proficiency in Word to know how to turn that feature off?
Some state bar associations have been struggling for years with the ethics of producing metadata versus scrubbing it, and there is not yet any uniform agreement. Personally, I think most metadata should be produced; it’s in keeping with the Sedona Conference’s Cooperation Proclamation, and it generally would cut down on the expense of trying to re-constitute a searchable version of the metadata when produced to opposing counsel. But document version history? That’s creating a duty to preserve and produce that never existed in the days of paper documents. Do we really want to increase the burden of discovery even more than the exploding volume of ESI (and the evolving best practices to deal with it) already requires?
In my next post, I plan to address the challenges inherent in getting lawyers to cooperate in eDiscovery in the first place. For the time being, I propose we focus on shaking that tree for a while, and leave the redefining of “eDiscovery” until we’ve had more success with Job One.