Part of the Solution

Idealistic musings about eDiscovery

It’s Worth The Reminder

If you have done one of these published “Q&A” things before, as I have, you know that the author not only provides the A, but also the Q. The author gets to emphasize exactly what she wants to emphasize, in exactly the way she wants to emphasize it. That being said, Gabriela Baron reminds us of some important ethical points on the subject of technology-assisted review that need emphasizing: specifically, that the ethical attorney must develop at least some competence with the technology:

Comment 8 to ABA Model Rule of Professional Conduct 1.1 requires lawyers to ‘keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology.’ Lawyers need not become statisticians to meet this duty, but they must understand the technology well enough to oversee its proper use.

Her blog post is a pretty good, succinct summary, and one that bears being used to refresh our memory.


Proportionality in Discovery: Example #243

Courtesy K&L Gates, this recent opinion from USDC California in which the judge points out that you can’t very well conduct discovery with any sense of proportionality if you don’t know what the damages in question are:

[T]he court indicated that Plaintiff’s “tight-lipped” disclosures regarding damages, including indicating its desire for the defendant to wait for Plaintiff’s expert report, were “plainly insufficient.”  The court went on to reason that “[e]ven if [Defendant] were willing to wait to find out what this case is worth—which it is not—the court still needs to know as it resolves the parties’ various discovery-related disputes.  Proportionality is part and parcel of just about every discovery dispute.” (Emphasis added.)

Moral of the story: Modern discovery is not compatible with a plaintiff mindset of “We won’t specify an amount of damages sought, because then we can’t shortchange our potential recovery.”

Why Manual Review Doesn’t Work

I’ve had the occasional conversation with Greg Buckles in which we take opposing views on the validity of the 1985 Blair-Maron study. Herb Roitblat now weighs in with a quite scientific, yet blissfully simple, explanation why manual review should never be considered the “gold standard” for document review accuracy.

It may seem that we have effective access to all of the information in a document, but the available evidence suggests that we do not. We may be confident in our reading ability, but at best we do a reasonable job with the subset of information that we do have.

Get to Herb’s conclusion to see what (besides the obvious) this has to do with technology-assisted review. It’s worth the read.

Chain Chain Chain …

Here’s a worthy reminder from Amy Bowser-Rollins of the need to maintain chain of custody logs while collecting eDiscovery. With all the emphasis these days on TAR, it’s nice to be reminded of the fundamentals every once in a while.

“The man who complains about the way the ball bounces is likely the one who dropped it.” – Lou Holtz

If ESI Isn’t Inaccessible, Better Speak Up

I don’t know if I’m more impressed that the author’s name is “Gary Discovery”, or that the ESI_logo[1]wisdom contained in his note is so cogent, but this author cites a new Pennsylvania case in which the judge presumed ESI to be inaccessible where neither party contended otherwise. In this case, the result was that the costs of production shifted to the requesting party.

The requesting party should submit to the court that the ESI sought is accessible to avoid both a presumption of inaccessibility and the possibility of cost-shifting.  Requesting parties should not leave it up to the producing party to bear the burden of showing that the ESI is inaccessible because the courts are now willing to presume this finding if neither party contends otherwise.

“Someday You are Bound to Crash and Burn”

Ralph Losey’s e-Discovery Team blog is often highly technical but always interesting. Ralph is one of the (if not the) leading theorist on search and prediction, and he excels at finding simple metaphors to explain his headache-inducing mathematical constructs. (Hey, I was a liberal arts major. I know my intellectual limits.)

In his latest post, Ralph compares Kroll Ontrack’s EDR software to a race car. The far-ranging post is worth a read, if only to get to his final paragraph, of which I agree with every syllable:

What passes as a good faith use of predictive coding by some law firms is a disgrace. Of course, if hide the ball is still your real game of choice, then all of the good software in the world will not make any difference. Keep breaking the law like that and someday you are bound to crash and burn.

New Ruling from Rio Tinto Case: Parties Can Use Keywords with Predictive Coding

Here’s a good post from Philip Favro at Recommind, regarding Judge Peck’s new “hot-button” case dealing with technology-assisted review:

New Ruling from Rio Tinto Case Confirms Parties Can Use Keywords with Predictive Coding.

Like King Solomon’s famous mandate to split the baby, the court’s middle ground decree wisely provided each party with a measure of what they requested while also resolving the dispute. By permitting Vale to cull down the document universe with search terms, the court honored the parties’ predictive coding use agreement as Vale had requested. However, the court placated Rio Tinto’s concerns by allowing it to propose search terms that might capture relevant information that might otherwise have been excluded.

He’s BAAAAAaaaaack … again …

Those of you who have been following this dormant blog know that I had been instructed by the Powers That Be at my company to stop blogging if I wanted a chance at promotion. I noted that, the opportunity for promotion having fallen through, I would resume blogging unless my employers directed me otherwise.

Then I fell silent for nearly two years. You can guess what happened.

Well, now my company has decided to eliminate my position, not only releasing me back into the wild to seek new challenges, but also releasing me to begin blogging again! (By the way, if you’re aware of anyone who can benefit from an eDiscovery attorney, consultant, trainer and subject matter expert, please drop me a line.)

So, brace yourselves … because now I’ve got some things to say, and a lot of time available to say it.

Craig Ball, Predictive Coding, and Wordsmithing

Boy, I wish I could write like Craig Ball does.

I have written many articles and blog posts on technology-assisted review, but all my thousands of words cannot communicate my beliefs on the subject as gracefully, powerfully, and concisely as Craig recently put it:

Indeed, there is some cause to believe that the best trained reviewers on the best managed review teams get very close to the performance of technology-assisted review. …

But so what?  Even if you are that good, you can only achieve the same result by reviewing all of the documents in the collection, instead of the 2%-5% of the collection needed to be reviewed using predictive coding.  Thus, even the most inept, ill-managed reviewers cost more than predictive coding; and the best trained and best managed reviewers cost much more than predictive coding.  If human review isn’t better (and it appears to generally be far worse) and predictive coding costs much less and takes less time, where’s the rational argument for human review?

So, um … yeah, what he said.

Technology-Assisted Review: Precision and Recall, in Plain English


deck (Photo credit: pro_cyp)

In my absence from the blawgosphere, many other commentators have described the metrics of technology-assisted review, or predictive coding, or whatever-we’re-calling-it-today much more eloquently than I could. However, the two primary metrics of TAR, Precision and Recall, still give lots of legal professionals fits as they try to apply it to their own sampling and testing iterations. So, for those of you still struggling with application of these concepts,  here’s my explanation of these metrics (with some inspiration from a co-worker) in plain English:

Let’s imagine that we have a regulation deck of 52 playing cards. We want to locate all of the spade cards as quickly and cheaply as possible. So, we instruct our computer algorithms to:

  1. identify all spades in the deck; and
  2. identify all non-spades in the deck.

With this information, our predictive computer correctly identifies five of the 13 spades in the deck, and correctly identifies all 39 non-spade cards. Because the computer predicted correctly 44 out of 52 times, or with 84.6 percent accuracy, we should be thrilled, right?

Uh … no.

Even though the computer’s predictions were almost 85 percent accurate across the entire deck, we asked the computer to identify the spade cards. Our computer correctly identified five spades, which means that the computer predicted spades with 100 percent Precision. (If the computer had “identified” six spades but one of them had actually been a club, for example, the Precision score would have dropped to 83.3 percent.)

However, look at the bigger picture. The computer identified only five of the 13 spades in the deck, leaving eight spades unaccounted for. This means that the computer’s Recall score — the percentage of documents correctly identified out of all the appropriate documents available – is a pathetic 38.5 percent.

Our 84.6 percent accuracy score won’t help us in front of the judge, and neither will our 100 percent Precision score by itself. The Recall score of 38.5 percent is a failing grade by anyone’s metric.

But let’s turn this example around. Remember, we also asked the computer to identify all NON-spades in the deck, which it did correctly 39 out of 39 times. As to non-spade cards, both our Precision and Recall scores are a whopping 100 percent – much better than that semi-fictional “accuracy” score listed above.

Analogizing this to document review, rather than having a human review all 52 cards to locate the spades, or rely on the computer to incompletely identify  the spades in the deck, let’s run with our highest-scoring metrics and accept the computer’s predictions as to the non-spade cards. Now, instead of 52 cards, we only have to review 13 of them – a savings in review time (and costs) of 75 percent.

This 52-card example may seem overly simplistic, but multiply it by 10,000 decks of cards all shuffled together and suddenly, this exercise begins to look a lot more like the typical document review project. Technology-assisted review can slash huge amounts of time and expense from a document review, as long as we understand the limits of what it can – and cannot, depending on the circumstances – do for us.