Understanding the Technical Bias of Westlaw, Lexis Advance, Fastcase, Google Scholar, and Casetext

By Greg Lambert on December 12, 2016

Definition of algorithm :

noun al·go·rithm ˈal-gə-ˌri-thəm – a step-by-step procedure for solving a problem or accomplishing some end especially by a computer

When I attended the WestPAC Law Librarian meeting in Jackson Hole, WY a couple of months ago, I had the opportunity to sit in on

University of Colorado Law School’s Susan Nevelow Mart’s presentation on legal researcher’s reliance on algorithms for online legal research. Susan’s presentation discussed her SSRN Paper entitled “The Algorithm as a Human Artifact: Implications for Legal {Re}Search” where she breaks down the algorithmic affects of Westlaw, Lexis Advance, Fastcase, Google Scholar, Ravel Law, and Casetext.

The key thing to remember, says Mart, is that we “need to remember that the algorithms that are returning results to them were designed by humans.” That includes all the “biases and assumptions” that come with the human experience. In other words a little bias and assumption on the part of the people developing the computer algorithms can cause dramatic changes in the results produced with similar content and search terms. As a researcher, Mart states that it is important that we “acquire some expertise about the technology at the meta-level.” How can you trust the results if you are not familiar with the way the tools are designed to index, search, and retrieve those results? The problem with this argument is that most legal research providers don’t want to reveal very much about the processes that go on behind the scenes to pull those top 10, 25, 50, or 1000 results. Mart is calling for more “Algorithmic Accountability” from our legal databases in order to help legal researchers better understand the biases present in the retrieved results.

Mart’s paper and research behind it attempt to test the different legal research databases on same search terms and same data content, and evaluate the results to see where results overlap and differ. The experiment wields results that are, in Mart’s words “a remarkable testament to the variability of human problem solving.” The top ten results from each resource showed very little consistency, and “hardly any overlap in the cases, and only about 7% of the cases returned were in all six database results. That low of a return rate should cause a bit of a shudder to run up the spine of legal researchers.

What is a researcher to do in this day and age of very little Algorithmic Accountability? First, researchers need to call upon these database providers to give us more detailed information about how their algorithms are set up, and the technical biases that result from these rules. Mart states that “the systems we use are black boxes,” that prevent us from understanding how these technical biases skew the results of our searches. “Algorithmic accountability will help researchers understand the best way to manipulate the input into the black box, and be more certain of the strengths and weaknesses of the output.”

Until we better understand the processes that go on in the background, researchers today should expand their searches, and use multiple databases in order to reduce the effects of technological bias. Mart explains that, “[t]he uniqueness of results may show something about the world view of each database that suggests that searching in multiple databases may be the 21st century version of making sure that multiple authorial viewpoints are highlighted in a library collection’s holdings.”

Within the SSRN paper, Susan Nevelow Mart presents the findings of her Empirical Study and breaks out the results by:

Uniqueness of Cases
Relevance
Relevant and Unique
Number of Results Returned by Each Query
Age of Cases

The different databases have individual strengths and weaknesses in each category, and the results, read as a whole, back up Mart’s suggestion of searching multiple databases. Until legal research providers begin to open up their black boxes and adopt more Algorithmic Accountability, researchers will need to expand our own legal information literacy with a better understanding of how each database compiles, categorizes, indexes, searches, and prioritizes the results. Hopefully, Mart’s research, and pressure from lawyers and researchers will help push these providers to shine a little more light into their algorithmic black boxes.

[ed. note – Updated at 11:30 CT to include Ravel Law as part of the databases reviewed by Susan Nevelow Mart. – GL]