Follow me on TwitterMy Tweets
Idealistic musings about eDiscovery
The problem with technology-assisted review is that the best practices to bring about the most accurate, defensible review are, quite frankly, too onerous for most attorneys to accept.
In “TAR 1.0”, the initial iteration of computer-aided document analysis, as many documents as possible from the total corpus had to be loaded up into the TAR system and, from this nebulous blob of relevant data, non-relevant data, fantasy football updates and cat memes, a statistically-valid sample was drawn at random. It then fell to a senior attorney on the litigation team to manually review and code this “seed set”, after which the computer would identify similarities among documents with similar tags and try to extrapolate those similarities to the entire document corpus.
There are a number of aspects to modern document review that aren’t practical with this scenario – using unculled data to generate the seed set, assuming that you have most of the corpus documents to draw from at the outset – but the most glaring impracticality is also the most critical requirement of TAR 1.0:
Senior attorneys, as a rule, HATE to review documents.
It’s why they hire junior attorneys or contract reviewers. It’s because generally, senior attorneys’ time is better spent on tasks that are more overtly significant to their clients, which in turn justifies them to bill a lot more per hour than the reviewers do. And, if a statistically valid seed set contains some 2,400 randomly selected documents (presuming a confidence score of >95 percent and a margin of error of +/- two percent), that’s the better part of an entire workweek the senior attorney would have to devote to the review.
No wonder TAR 1.0 never caught on. It was designed by technologists – and brilliantly so – but completely ignored the realities of modern law practice.
Now we’re up to “TAR 2.0”, the “continuous active learning” method which has received less attention but is nonetheless a push in the right direction toward legal industry-wide acceptance. In TAR 2.0, the computer constantly re-trains itself and refines its notions of what documents do and do not meet each tag criterion, so that the initial seed set can be smaller and more focused more on documents that are more likely to be responsive, rather than scattershooting randomly across the entire document corpus. As more documents are loaded into the system, the tag criteria can be automatically applied during document processing (meaning that the new documents are classified as they enter the system), and refinements crafted as humans review the newly loaded docs would then in turn be re-applied to the earlier-predicted docs.
Now, that last paragraph makes perfect sense to me. The fact that, despite my editing and revisions, it still would appear confusing to the average non-techie is one of the big problems with TAR 2.0: those of us who work with it get it, but explaining it to those who don’t is a challenge. But the biggest problem I see with TAR 2.0 once again must be laid at the feet of the attorneys.
Specifically, most of the training and re-training in a TAR 2.0 system will come courtesy of the manual document reviewers themselves. Ignoring for a moment the likelihood that review instructions to an outsourced document review bullpen tend to be somewhat less than precise anyway, several reviewers can look at the same document and draw very different conclusions. Let’s say you have a non-practicing JD with a liberal arts background, a former corporate attorney with engineering and IP experience, an inactive plaintiff’s trial lawyer, and a paralegal who was formerly a nurse. Drop the same document – let’s say, a communiqué from an energy trader to a power plant manager – in front of all four, and ask them to tag for relevance, privilege, and relevant issues. You’re likely to get four different results.
Which of these results would a TAR 2.0 system use to refine its predictive capabilities? All of them. And TAR has not yet advanced to the sophistication required to analyze four different tagging responses to the same document and refine from them the single most useful combination of criteria. Instead, it’s more likely to cloud up the computer’s “understanding” of what made this document relevant or not relevant.
The IT industry uses the acronym GIGO: garbage in, garbage out. Blair and Maron proved back in 1985* that human reviewers tend not only to be inaccurate in their review determinations, but that they are also overconfident in their abilities to find sufficient documents that meet their criteria. In TAR 2.0, ultimately, the success or failure of the computer’s ability to accurately tag documents may be in the hands of reviewers whose only stake in the litigation is a paycheck.
Until last week, I was strongly in favor of a “TAR 1.5” approach: start with a smaller seed set reviewed and tagged by a more-senior attorney, let the TAR system make its initial definitions and determinations, use those determinations to cull and prioritize the document corpus, then let the document reviewers take it from there and use “continuous active learning” to further iterate and refine the results. It seemed to me that this combined the best practices from both versions of the process: start with the wisdom and craftsmanship of an experienced litigator and apply it to all the available documents, then leave the document-level detail to contract reviewers using the TAR-suggested predictions as guidance.
But last week, I interviewed with the founders of a small company that have a different approach. Neither desiring to put any pressure on the company nor wanting to inadvertently divulge any trade secrets that might have been shared, I won’t identify them and won’t talk about their processes other than to say that perhaps they’ve come up with a “TAR 3.0” approach: make automatic TAR determinations based on statistical similarity of aspects of the document, rather than on the entire content of each document. It’s a lawyerly, rather than a technical, approach to the TAR problem, which to me is what makes it brilliant (and brilliantly simple).
Whether I become part of this company or not, the people who run it have given me a lot to think about, and I’ll be sharing my thoughts on these new possibilities in the near future.
*David C. Blair & M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 COMMC’NS ACM 289 (1985).
Oh, this is good. If you haven’t already signed up for the ALM Network (it’s free, as is most of their content), it’s worth doing so just to read this post (first of a two-part series) from Geoffrey Vance on Legaltech News. It pins the failure of acceptance of technology-assisted review (TAR) right where it belongs: on attorneys who refuse to get with the program.
As I headed home, I asked myself, how is it—in a world in which we rely on predictive technology to book our travel plans, decide which songs to download and even determine who might be the most compatible on a date—that most legal professionals do not use predictive technology in our everyday client-serving lives?
I’ve been to dozens of panel discussions and CLE events specifically focused on using technology to assist and improve the discovery and litigation processes. How can it possibly be—after what must be millions of hours of talk, including discussions about a next generation of TAR—that we haven’t really even walked the first-generation TAR walk?
Geoffrey asks why attorneys won’t get with the program. In a comment to the post, John Tredennick of Catalyst lays out the somewhat embarrassing answer:
Aside from the fact that it is new (which is tough for our profession), there is the point that TAR 2.0 can cut reviews by 90% or more (TAR 1.0 isn‘t as effective). That means a lot of billable work goes out the window. The legal industry (lawyers and review companies) live and die by the billable hour. When new technology threatens to reduce review billables by a substantial amount, are we surprised that it isn‘t embraced? This technology is driven by the corporate counsel, who are paying the discovery bills. As they catch on, and more systems move toward TAR 2.0 simplicity and flexibility, you will see the practice become standard for every review.
Especially with respect to his last sentence, I hope John is right.
I’ve had the occasional conversation with Greg Buckles in which we take opposing views on the validity of the 1985 Blair-Maron study. Herb Roitblat now weighs in with a quite scientific, yet blissfully simple, explanation why manual review should never be considered the “gold standard” for document review accuracy.
It may seem that we have effective access to all of the information in a document, but the available evidence suggests that we do not. We may be confident in our reading ability, but at best we do a reasonable job with the subset of information that we do have.
Get to Herb’s conclusion to see what (besides the obvious) this has to do with technology-assisted review. It’s worth the read.
Boy, I wish I could write like Craig Ball does.
I have written many articles and blog posts on technology-assisted review, but all my thousands of words cannot communicate my beliefs on the subject as gracefully, powerfully, and concisely as Craig recently put it:
Indeed, there is some cause to believe that the best trained reviewers on the best managed review teams get very close to the performance of technology-assisted review. …
But so what? Even if you are that good, you can only achieve the same result by reviewing all of the documents in the collection, instead of the 2%-5% of the collection needed to be reviewed using predictive coding. Thus, even the most inept, ill-managed reviewers cost more than predictive coding; and the best trained and best managed reviewers cost much more than predictive coding. If human review isn’t better (and it appears to generally be far worse) and predictive coding costs much less and takes less time, where’s the rational argument for human review?
So, um … yeah, what he said.
In my absence from the blawgosphere, many other commentators have described the metrics of technology-assisted review, or predictive coding, or whatever-we’re-calling-it-today much more eloquently than I could. However, the two primary metrics of TAR, Precision and Recall, still give lots of legal professionals fits as they try to apply it to their own sampling and testing iterations. So, for those of you still struggling with application of these concepts, here’s my explanation of these metrics (with some inspiration from a co-worker) in plain English:
Let’s imagine that we have a regulation deck of 52 playing cards. We want to locate all of the spade cards as quickly and cheaply as possible. So, we instruct our computer algorithms to:
With this information, our predictive computer correctly identifies five of the 13 spades in the deck, and correctly identifies all 39 non-spade cards. Because the computer predicted correctly 44 out of 52 times, or with 84.6 percent accuracy, we should be thrilled, right?
Uh … no.
Even though the computer’s predictions were almost 85 percent accurate across the entire deck, we asked the computer to identify the spade cards. Our computer correctly identified five spades, which means that the computer predicted spades with 100 percent Precision. (If the computer had “identified” six spades but one of them had actually been a club, for example, the Precision score would have dropped to 83.3 percent.)
However, look at the bigger picture. The computer identified only five of the 13 spades in the deck, leaving eight spades unaccounted for. This means that the computer’s Recall score — the percentage of documents correctly identified out of all the appropriate documents available – is a pathetic 38.5 percent.
Our 84.6 percent accuracy score won’t help us in front of the judge, and neither will our 100 percent Precision score by itself. The Recall score of 38.5 percent is a failing grade by anyone’s metric.
But let’s turn this example around. Remember, we also asked the computer to identify all NON-spades in the deck, which it did correctly 39 out of 39 times. As to non-spade cards, both our Precision and Recall scores are a whopping 100 percent – much better than that semi-fictional “accuracy” score listed above.
Analogizing this to document review, rather than having a human review all 52 cards to locate the spades, or rely on the computer to incompletely identify the spades in the deck, let’s run with our highest-scoring metrics and accept the computer’s predictions as to the non-spade cards. Now, instead of 52 cards, we only have to review 13 of them – a savings in review time (and costs) of 75 percent.
This 52-card example may seem overly simplistic, but multiply it by 10,000 decks of cards all shuffled together and suddenly, this exercise begins to look a lot more like the typical document review project. Technology-assisted review can slash huge amounts of time and expense from a document review, as long as we understand the limits of what it can – and cannot, depending on the circumstances – do for us.
[Note: This was originally written as part of an article for a print publication for Texas lawyers, but was cut from the publication draft. Most references to the Texas Rules of Professional Conduct (TDRPC) can also be read to refer to one of the ABA Model Rules of Professional Conduct. – Gary]
It is certainly no surprise to any member of the Texas bar that TDRPC 1.04(a) emphasizes, “A lawyer shall not enter into an arrangement for, charge, or collect an illegal fee or unconscionable fee[.]” This means that, in addition to charging clients reasonable fees for the work the attorneys do personally, they should not artificially inflate the fees passed through from, let’s say, a team of document review attorneys. These temporary attorneys typically work for an outplacement firm, and get paid $25-35 per hour for their time reviewing documents (increasingly, all electronic) as part of the first-pass document review. The outplacement firm marks up these fees in billing the law firm. Frequently, the law firm will then mark up the fees again in billing its client.
This raises the ethical question: How much can a firm ethically mark up the contract attorneys’ time? Most people would not consider a reasonable markup indefensible (after all, the law firm has overhead costs too). But how much is “reasonable”? Let’s presume that you hired an expert witness who charged your firm $15,000 for his services, but the firm billed the client for $50,000 for the expert. Most grievance committees wouldn’t blink at issuing sanctions for this egregious markup.
Similarly, firms mark up the fees of their staff attorneys. However, given that contract reviewers are not technically engaged in the practice of law when performing first-pass document review (they do not, after all, determine how the documents they review fit into the theory of the case), at what point does the firm’s markup cross the line into “an illegal or unconscionable fee”? One team of bloggers has argued that since contract review attorneys exercise no independent legal judgment, they are essentially “a piece of office equipment” and therefore, like charges for copies or courier fees, they should have their costs marked up only minimally.
Except in the context of attorney fee awards generally, courts haven’t yet wrestled with the ethical implications of contract review attorney markup. A malpractice case pending in L.A. Superior Court, J-M Mfg. Co., Inc. v. McDermott Will & Emery, will likely shed some light on this issue eventually. The plaintiff has sued the McDermott law firm claiming that they did not adequately supervised an outsourced document review project, and that as a result, some 3,900 privileged documents (out of about 250,000 total) were produced that should not have been. This matter, however, will take years to result in a written opinion. [Many thanks to Joe Howie for posting the Complaint.]
The notion of “reasonable fees” goes beyond merely marking up an outside reviewer’s bill. A trio of respected commentators – Patrick Oot, Anne Kershaw, and the aforementioned Joe Howie – have argued that in ESI collections, failure to utilize technology to consolidate duplicate records prior to review, thereby requiring multiple reviewers to look at exactly the same content to make exactly the same responsiveness and privilege decisions (each of whom must of course bill for their time separately), is by definition double-billing and, therefore, unethical. They wrote:
If ediscovery were a small part of litigation and duplicate consolidation had an imperceptibly small impact on ediscovery, the whole debate might be dismissed under the rationale of. However, the cost of ediscovery in general, and the cost of relevance and privilege reviews in particular, have been a major concern for years. There are no excuses for “not getting it” when it comes to ediscovery. Lawyers who bill hundreds of dollars an hour are implicitly promising a certain level of competence that would include the basic notion of consolidating duplicates.
These commentators go on to note, “[L]awyers are making representations to their adversaries and to the courts regarding the volume of ESI that has to be handled and the time required to review those records. Lawyers who don’t properly consolidate duplicates are inflating the time and cost required to review their productions.” Such behavior would violate TDRCP 4.01: “[A] lawyer shall not knowingly: (a) make a false statement of material fact or law to a third person[.]” It might also run contrary to Comment 6 to TDRCP 1.04, noted in the first paragraph above: “[A] lawyer should not abuse a fee arrangement based primarily on hourly charges by using wasteful procedures.”