Idealistic musings about eDiscovery
Technology-Assisted Review: Precision and Recall, in Plain English
June 18, 2013Posted by on
In my absence from the blawgosphere, many other commentators have described the metrics of technology-assisted review, or predictive coding, or whatever-we’re-calling-it-today much more eloquently than I could. However, the two primary metrics of TAR, Precision and Recall, still give lots of legal professionals fits as they try to apply it to their own sampling and testing iterations. So, for those of you still struggling with application of these concepts, here’s my explanation of these metrics (with some inspiration from a co-worker) in plain English:
Let’s imagine that we have a regulation deck of 52 playing cards. We want to locate all of the spade cards as quickly and cheaply as possible. So, we instruct our computer algorithms to:
- identify all spades in the deck; and
- identify all non-spades in the deck.
With this information, our predictive computer correctly identifies five of the 13 spades in the deck, and correctly identifies all 39 non-spade cards. Because the computer predicted correctly 44 out of 52 times, or with 84.6 percent accuracy, we should be thrilled, right?
Uh … no.
Even though the computer’s predictions were almost 85 percent accurate across the entire deck, we asked the computer to identify the spade cards. Our computer correctly identified five spades, which means that the computer predicted spades with 100 percent Precision. (If the computer had “identified” six spades but one of them had actually been a club, for example, the Precision score would have dropped to 83.3 percent.)
However, look at the bigger picture. The computer identified only five of the 13 spades in the deck, leaving eight spades unaccounted for. This means that the computer’s Recall score — the percentage of documents correctly identified out of all the appropriate documents available – is a pathetic 38.5 percent.
Our 84.6 percent accuracy score won’t help us in front of the judge, and neither will our 100 percent Precision score by itself. The Recall score of 38.5 percent is a failing grade by anyone’s metric.
But let’s turn this example around. Remember, we also asked the computer to identify all NON-spades in the deck, which it did correctly 39 out of 39 times. As to non-spade cards, both our Precision and Recall scores are a whopping 100 percent – much better than that semi-fictional “accuracy” score listed above.
Analogizing this to document review, rather than having a human review all 52 cards to locate the spades, or rely on the computer to incompletely identify the spades in the deck, let’s run with our highest-scoring metrics and accept the computer’s predictions as to the non-spade cards. Now, instead of 52 cards, we only have to review 13 of them – a savings in review time (and costs) of 75 percent.
This 52-card example may seem overly simplistic, but multiply it by 10,000 decks of cards all shuffled together and suddenly, this exercise begins to look a lot more like the typical document review project. Technology-assisted review can slash huge amounts of time and expense from a document review, as long as we understand the limits of what it can – and cannot, depending on the circumstances – do for us.