Follow me on TwitterMy Tweets
Idealistic musings about eDiscovery
With all the views the eDiscovery acceptance poll in my last post has received, only eight votes (and one comment) have been counted. Since people can vote multiple times, I suspect only two or three people have offered their opinions. (I don’t expect a statistically-valid data set, but c’mon …)
I’m posting this one more time; please share your thoughts. Meanwhile, I’ve got an essay in the works regarding Ralph Losey’s magnum opus regarding Predictive Coding 3.0. (I’m reading all 15,000 words so you don’t have to!)
With Halloween around the corner, let’s try something different. Here’s a little poll, in which I ask why you think attorneys (as a whole) have been reluctant to embrace eDiscovery.
You may choose more than one option (and the more cynical of you may choose to select all of them), but I especially welcome your comments explaining what you think are the root causes of the profession’s ambivalence (including – especially – if you think adoption and acceptance are proceeding at exactly the right pace).
I’ll use the highly-unscientific results as the basis for a future post. Thanks for taking the time to participate!
Those few words were all it took for the European Union Court of Justice (ECJ) to shoot down the “Safe Harbor” data agreement between the US and EU on October 6. To paraphrase the prolix paragraph that preceded those four words, the ECJ ruled that the Safe Harbor agreement notwithstanding, each EU nation still retained its power to review claims of personal breaches of data privacy rights; thus, the agreement has no effect.
From Hogan Lovells:
Safe Harbor was jointly devised by the European Commission and the U.S. Department of Commerce as a framework that would allow US-based organisations [sic] to overcome the restrictions on transfers of personal data from the EU. Following a dispute between Austrian law student Max Schrems and the Irish Data Protection Commissioner, the [ECJ] was asked to consider whether a data protection supervisory authority was bound by the European Commission’s decision that Safe Harbor provided an adequate level of protection for European data.
Eric Levy summarized the fact situation nicely:
Schrems, an Austrian citizen and a Facebook user since 2008, alleged that Facebook should not be allowed to transfer the personal information of it subscribers from its Irish servers to servers in the US. In the light of revelations made in 2013 by Edward Snowden concerning the activities of United States intelligence services like the NSA, Schrems contended that the law and practices of the United States, including Safe Harbor, offered no real protection against surveillance by the United States of personal data transferred to that country. On October 6, 2015 the ECJ agreed with him.
The New York Times has also written a tight, more complete version of the back story.
According to Hogan Lovells, supra, the death of Safe Harbor means:
- Transfers of personal data from the EU to the US currently covered by Safe Harbor will be unlawful unless they are suitably authorized by data protection authorities or fit within one of the legal exemptions.
- Multinationals relying on Safe Harbor as an intra-group compliance tool to legitimize data transfers from EU subsidiaries to their US parent company or other US-based entities within their corporate group will need to implement an alternative mechanism.
- US-based service providers certified under Safe Harbor to receive data from European customers will need to provide alternative guarantees for those customers to be able to engage their services lawfully.
So, instead of a single EU-wide privacy benchmark to apply when companies send foreign citizens’ personal data back to the US, each EU country can now apply its own standards for data privacy. This is likely to mean that some EU countries will suspend transfer of their citizens’ data to the US altogether.
During discovery, US judges had already shown a rather dismissive attitude toward foreign data privacy rights, so long as that data might prove discoverable in the US court. “I don’t care how hard it might be for you to get that data,” some judges had said, “that’s not my problem. It’s your case, and your data, so do it or face sanctions.” Huron Consulting had summarized:
Thus, U.S. courts where a lawsuit is filed and where the parties have appeared are likely to enforce U.S. rules of procedure regarding requests for discovery of information housed overseas, yet the countries where the information is housed may sanction parties who produce information protected by the privacy rules or without complying with the Hague Convention.
That was the best-case scenario under Safe Harbor. Now, the 28 EU nations previously bound by the agreement are free to apply their own data privacy rules to information housed in computers within their borders.
There is no “effective date” specified in the ECJ’s ruling, implying that Safe Harbor is dead as of now. However, Norton Rose Fulbright suggested prior to the ruling that panic is unnecessary:
If the ECJ finds that [Member State Data Protection Authorities (DPAs)] have the authority to make their own determinations as to whether certain types of transfers under the Safe Harbor are valid, there would be no immediate legal effect on the legality of transfers relying on the Safe Harbor. The Irish proceedings that gave rise to Schrems would continue, and other complaints would likely be filed to seek review by the Irish and other DPAs. While these proceedings could ultimately lead to data transfers being found invalid, this process would take months or years. Meanwhile, the European Commission would have more time to reach a new Safe Harbor agreement with the US, offering the DPAs an opportunity to find that the enhanced framework addresses their concerns.
If you have pending litigation involving electronic data that you thought your clients produced in compliance with their Safe Harbor certification, do your own research and reconsider your collection and production strategies in light of the meager guidance provided by the ECJ and in the references quoted here.
This is gonna get interesting.
The problem with technology-assisted review is that the best practices to bring about the most accurate, defensible review are, quite frankly, too onerous for most attorneys to accept.
In “TAR 1.0”, the initial iteration of computer-aided document analysis, as many documents as possible from the total corpus had to be loaded up into the TAR system and, from this nebulous blob of relevant data, non-relevant data, fantasy football updates and cat memes, a statistically-valid sample was drawn at random. It then fell to a senior attorney on the litigation team to manually review and code this “seed set”, after which the computer would identify similarities among documents with similar tags and try to extrapolate those similarities to the entire document corpus.
There are a number of aspects to modern document review that aren’t practical with this scenario – using unculled data to generate the seed set, assuming that you have most of the corpus documents to draw from at the outset – but the most glaring impracticality is also the most critical requirement of TAR 1.0:
Senior attorneys, as a rule, HATE to review documents.
It’s why they hire junior attorneys or contract reviewers. It’s because generally, senior attorneys’ time is better spent on tasks that are more overtly significant to their clients, which in turn justifies them to bill a lot more per hour than the reviewers do. And, if a statistically valid seed set contains some 2,400 randomly selected documents (presuming a confidence score of >95 percent and a margin of error of +/- two percent), that’s the better part of an entire workweek the senior attorney would have to devote to the review.
No wonder TAR 1.0 never caught on. It was designed by technologists – and brilliantly so – but completely ignored the realities of modern law practice.
Now we’re up to “TAR 2.0”, the “continuous active learning” method which has received less attention but is nonetheless a push in the right direction toward legal industry-wide acceptance. In TAR 2.0, the computer constantly re-trains itself and refines its notions of what documents do and do not meet each tag criterion, so that the initial seed set can be smaller and more focused more on documents that are more likely to be responsive, rather than scattershooting randomly across the entire document corpus. As more documents are loaded into the system, the tag criteria can be automatically applied during document processing (meaning that the new documents are classified as they enter the system), and refinements crafted as humans review the newly loaded docs would then in turn be re-applied to the earlier-predicted docs.
Now, that last paragraph makes perfect sense to me. The fact that, despite my editing and revisions, it still would appear confusing to the average non-techie is one of the big problems with TAR 2.0: those of us who work with it get it, but explaining it to those who don’t is a challenge. But the biggest problem I see with TAR 2.0 once again must be laid at the feet of the attorneys.
Specifically, most of the training and re-training in a TAR 2.0 system will come courtesy of the manual document reviewers themselves. Ignoring for a moment the likelihood that review instructions to an outsourced document review bullpen tend to be somewhat less than precise anyway, several reviewers can look at the same document and draw very different conclusions. Let’s say you have a non-practicing JD with a liberal arts background, a former corporate attorney with engineering and IP experience, an inactive plaintiff’s trial lawyer, and a paralegal who was formerly a nurse. Drop the same document – let’s say, a communiqué from an energy trader to a power plant manager – in front of all four, and ask them to tag for relevance, privilege, and relevant issues. You’re likely to get four different results.
Which of these results would a TAR 2.0 system use to refine its predictive capabilities? All of them. And TAR has not yet advanced to the sophistication required to analyze four different tagging responses to the same document and refine from them the single most useful combination of criteria. Instead, it’s more likely to cloud up the computer’s “understanding” of what made this document relevant or not relevant.
The IT industry uses the acronym GIGO: garbage in, garbage out. Blair and Maron proved back in 1985* that human reviewers tend not only to be inaccurate in their review determinations, but that they are also overconfident in their abilities to find sufficient documents that meet their criteria. In TAR 2.0, ultimately, the success or failure of the computer’s ability to accurately tag documents may be in the hands of reviewers whose only stake in the litigation is a paycheck.
Until last week, I was strongly in favor of a “TAR 1.5” approach: start with a smaller seed set reviewed and tagged by a more-senior attorney, let the TAR system make its initial definitions and determinations, use those determinations to cull and prioritize the document corpus, then let the document reviewers take it from there and use “continuous active learning” to further iterate and refine the results. It seemed to me that this combined the best practices from both versions of the process: start with the wisdom and craftsmanship of an experienced litigator and apply it to all the available documents, then leave the document-level detail to contract reviewers using the TAR-suggested predictions as guidance.
But last week, I interviewed with the founders of a small company that have a different approach. Neither desiring to put any pressure on the company nor wanting to inadvertently divulge any trade secrets that might have been shared, I won’t identify them and won’t talk about their processes other than to say that perhaps they’ve come up with a “TAR 3.0” approach: make automatic TAR determinations based on statistical similarity of aspects of the document, rather than on the entire content of each document. It’s a lawyerly, rather than a technical, approach to the TAR problem, which to me is what makes it brilliant (and brilliantly simple).
Whether I become part of this company or not, the people who run it have given me a lot to think about, and I’ll be sharing my thoughts on these new possibilities in the near future.
*David C. Blair & M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 COMMC’NS ACM 289 (1985).
On May 15, I was laid off from Hewlett-Packard as they prepared for their big corporate meiosis* in November. I found out in short order that about three-fourths of the remaining eDiscovery experts company-wide were also let go. My private opinion was that this likely signaled HP’s intent to get out of the eDiscovery software business.
Looks like my hunch is at least partially right. My friends at iManage (née Interwoven), formerly a part of Autonomy and later assimilated by HP, have bought their company back.
For the iManage leadership, this transaction is about much more than a product: it’s about a community that spans people, partners and hundreds of thousands of users, many of whom have used this solution for more than a decade. iManage also represents a set of values, based on our history of listening, innovating and delivering great products and support. Our buyout enables the team to continue to innovate with a community of thought leaders that share this passion.
My heartiest congratulations to my old colleagues in Chicago. (Hmmm … wonder if they need an eDiscovery expert?)
*After all these years, I finally found a use for that word from high-school biology! HP is splitting into two distinct companies, HP and HP Enterprise, on November 1.
Oh, this is good. If you haven’t already signed up for the ALM Network (it’s free, as is most of their content), it’s worth doing so just to read this post (first of a two-part series) from Geoffrey Vance on Legaltech News. It pins the failure of acceptance of technology-assisted review (TAR) right where it belongs: on attorneys who refuse to get with the program.
As I headed home, I asked myself, how is it—in a world in which we rely on predictive technology to book our travel plans, decide which songs to download and even determine who might be the most compatible on a date—that most legal professionals do not use predictive technology in our everyday client-serving lives?
I’ve been to dozens of panel discussions and CLE events specifically focused on using technology to assist and improve the discovery and litigation processes. How can it possibly be—after what must be millions of hours of talk, including discussions about a next generation of TAR—that we haven’t really even walked the first-generation TAR walk?
Geoffrey asks why attorneys won’t get with the program. In a comment to the post, John Tredennick of Catalyst lays out the somewhat embarrassing answer:
Aside from the fact that it is new (which is tough for our profession), there is the point that TAR 2.0 can cut reviews by 90% or more (TAR 1.0 isn‘t as effective). That means a lot of billable work goes out the window. The legal industry (lawyers and review companies) live and die by the billable hour. When new technology threatens to reduce review billables by a substantial amount, are we surprised that it isn‘t embraced? This technology is driven by the corporate counsel, who are paying the discovery bills. As they catch on, and more systems move toward TAR 2.0 simplicity and flexibility, you will see the practice become standard for every review.
Especially with respect to his last sentence, I hope John is right.
If you have done one of these published “Q&A” things before, as I have, you know that the author not only provides the A, but also the Q. The author gets to emphasize exactly what she wants to emphasize, in exactly the way she wants to emphasize it. That being said, Gabriela Baron reminds us of some important ethical points on the subject of technology-assisted review that need emphasizing: specifically, that the ethical attorney must develop at least some competence with the technology:
Comment 8 to ABA Model Rule of Professional Conduct 1.1 requires lawyers to ‘keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology.’ Lawyers need not become statisticians to meet this duty, but they must understand the technology well enough to oversee its proper use.
Her blog post is a pretty good, succinct summary, and one that bears being used to refresh our memory.
I’ve had the occasional conversation with Greg Buckles in which we take opposing views on the validity of the 1985 Blair-Maron study. Herb Roitblat now weighs in with a quite scientific, yet blissfully simple, explanation why manual review should never be considered the “gold standard” for document review accuracy.
It may seem that we have effective access to all of the information in a document, but the available evidence suggests that we do not. We may be confident in our reading ability, but at best we do a reasonable job with the subset of information that we do have.
Get to Herb’s conclusion to see what (besides the obvious) this has to do with technology-assisted review. It’s worth the read.
Here’s a worthy reminder from Amy Bowser-Rollins of the need to maintain chain of custody logs while collecting eDiscovery. With all the emphasis these days on TAR, it’s nice to be reminded of the fundamentals every once in a while.
“The man who complains about the way the ball bounces is likely the one who dropped it.” – Lou Holtz
I don’t know if I’m more impressed that the author’s name is “Gary Discovery”, or that the wisdom contained in his note is so cogent, but this author cites a new Pennsylvania case in which the judge presumed ESI to be inaccessible where neither party contended otherwise. In this case, the result was that the costs of production shifted to the requesting party.
The requesting party should submit to the court that the ESI sought is accessible to avoid both a presumption of inaccessibility and the possibility of cost-shifting. Requesting parties should not leave it up to the producing party to bear the burden of showing that the ESI is inaccessible because the courts are now willing to presume this finding if neither party contends otherwise.