[Ed. Note – We have launched The Geek in Review Substack page to put out content in a new way. One example of this content is a series of stories that I’ve been working on as I’ve learned more about how AI and automation tools are developed, and what works, and doesn’t work. As well as the improvements made as foundational models get better, or as the industry learns how to leverage the tools more effectively. Below is Chapter One of my “Beyond the Models” story where I start to dive in on why after two years of some pretty mediocre legal research tools, these tools are suddenly getting much, much better. We’ll still be posting here, but Substack gives us an interesting platform to work with that expands what we can do here on the blog. Including a lot more opportunities for us to hear from you! Come join us and see what all the Substack fuss is about. – Greg]
Beyond the Model: Part One – How Legal AI Got Smart
Preface
Any sufficiently advanced technology is indistinguishable from magic” – Arthur C. Clarke
For those of us in the legal industry, the past three years have created an enthusiasm around the practice and business of law that I’ve never seen before. The introduction of Generative AI created an immediate push into the legal industry, and legal research was seen as the most obvious and easiest candidate to be ‘fixed’ by AI. It has turned out to be one of the hardest.
I’ve spent the last three years trying to keep up. Change isn’t measured in quarterly updates, it is measured in weekly, and sometimes daily increments. Just understanding some of the basics can be challenging. So, I wanted to take a different approach to explaining some of the basics around why legal research seemed like an easy solution, and why it has taken a couple of really awful years of GenAI legal research tools before we started actually seeing some decent results.
Instead of going through all of the data and presenting information in a technical way, I took a page out of my friend Anusia Gillespie’s book and decided to explain it in stories. Storytelling might be a way for some of us to better wrap our heads around what it takes for AI to truly make sense of legal research.
Part one of the story introduces Cooper, Jesse, and Maya. A law firm innovator, a startup data scientist, and a law firm partner. The kind of people who I work with every day. We start off talking about how throwing LLMs at millions of documents of legal decisions doesn’t just work ‘out of the box.’ The near daily news articles of attorneys being sanctioned for “hallucinations” in their legal writings is a direct result of this misconception.
We needed a middle layer to connect the power of the LLM to the legal information, and for a couple of years the answer was Retrieval Augmented Generation (RAG). As you’ll learn from the story, it was a good first step but introduced some of its own problems.
I hope you enjoy this first of what I hope will be many stories of how innovation plays out in the legal field.
(Full article available on Substack)
Introduction
For the past couple of years, lawyers and technologists have marveled at how AI-powered legal research tools seem to be getting smarter every month. Tools like Lexis Protégé, Westlaw CoCounsel, and Vincent by Clio deliver answers that feel almost prescient. The assumption was simple: better foundational models like ChatGPT 5.2, Claude Opus 4.5, and Gemini 3.0 Pro equal better answers.
But as Jesse, a data scientist at a leading legal information company, explains to Cooper, a law firm innovator and librarian, the truth lies deeper in the stack. The industry has quietly hit the limits of “Naive RAG”—short for Retrieval Augmented Generation, the process of letting an AI look up documents to answer questions—because simple systems can match words but fail to understand context. The real breakthrough isn’t in how the AI generates text. It is in how Agentic RAG and Knowledge Graphs empower the AI to reason about relationships, authority, and hierarchy before it ever writes a word.
This story follows their journey from a chance conference conversation to late-night video calls, a visit to Jesse’s lab, and a final flight home as they uncover how the fusion of vector databases and graph-based reasoning is quietly transforming legal AI from a search engine into a strategic decision engine.
Chapter 1: The Spark at the Conference
The hotel lobby buzzed with the background noise unique to legal tech conferences: half caffeine, half optimism, entirely too much jargon. Cooper Graham balanced a paper cup of burnt coffee in one hand and an overstuffed swag bag in the other, weaving between clusters of attendees debating AI’s future like prophets at an ancient symposium.
“Cooper! You made it,” called Jesse, waving from a corner table near the windows.
Jesse Tanaka looked out of place among the navy suits, wearing sneakers and a lanyard that read “Lead Data Scientist, Legal AI Systems.”
“Jesse.” Cooper grinned, shaking his hand. “Good to see you. I was just telling my managing partner about your latest release. The research tools… they’ve changed. They don’t just find keywords anymore. They feel prescient. I assumed you guys finally got your hands on GPT-6 or some secret model.”
Jesse smirked, clicking a pen against the table. “That’s what everyone thinks. Better model equals better lawyer, right?”
“Isn’t it?”
“No,” Jesse said flatly. “If we were just relying on the models, we’d still be struggling with hallucinations. What you’re seeing isn’t a smarter brain. It’s a better memory structure.”
He reached for a cocktail napkin and flattened it against the table.
“For the last year, everyone in this industry has been doing what we call ‘Naive RAG.’ You take a billion legal documents, chop them into chunks, and shove them into a vector database.”
Cooper nodded. “Vector search. I know this part. It turns text into math so you can find similar concepts.”
“Right,” Jesse said. “And for finding similar language, it’s brilliant. If you ask about ‘emotional distress,’ it finds that concept perfectly. But here’s the limitation: Vector search is conceptual, but literal. It reads the text and concepts behind the text, but it’s blind to hierarchy.”
Jesse drew a box labeled Vector DB and wrote “Text” next to it.
“Here’s the trap. You feed documents into a vector database, someone asks a question, and the AI hunts for similar chunks. It forces that context into the answer. But if you ask, ‘How are Matthews v. Eldridge and social security benefits related?’, vector search struggles. It finds documents containing those words, but it doesn’t know if Matthews created the test, overruled a previous test, or was distinguished by a later court. It can’t tell the difference between a dissenting opinion and a holding if the words look the same.”
“So how did you fix it?” Cooper asked.
“We stopped treating the law like a pile of text and started treating it like a network,” Jesse said.
He drew a second box on the napkin, connecting it to the first with a circle labeled Agent.
“We moved to Agentic RAG. We store the data twice now. Once in a vector database for the language, and once in a Knowledge Graph for the logic. The Knowledge Graph maps the ‘DNA’ of the law into things like entities, citations, hierarchies, pass-throughs.”
Jesse tapped the circle in the center.
“And this? This is the Agent. The decision engine. When you ask a question now, the AI doesn’t just blindly search. It reasons. It asks: ‘Does Cooper need a specific quote? Or does he need to understand how Case A relates to Statute B?’”
Cooper stared at the diagram. “So it’s not just retrieving anymore. It’s deciding how to retrieve.”
“Exactly,” Jesse said. “Naive RAG breaks when knowledge gets complex. Agentic RAG gives the AI tools to explore. It can check the vector store for the text, realize the result looks shaky, and then hit the Knowledge Graph to verify the authority. It synthesizes both.”
The conference hallway began to clear as the next session started. Cooper looked down at the napkin, a rough schematic of a “dual-brain” system.
“You make it sound obvious,” Cooper said. “But honestly, most of us just assume the computer is magic.”
“It’s not magic,” Jesse said, standing up and slinging his backpack over his shoulder. “It’s engineering. We’re just finally building a system that respects that law isn’t just about what words you use. Now it’s about how they connect.”
“You free next week?” Cooper asked. “I need to see this running. Not on a napkin.”
“Teams call. Tuesday,” Jesse said. “I’ll show you the dashboard. You’ll see the Agent making decisions in real-time. It’s wild.”
As they shook hands, Cooper tucked the napkin into his notebook. He didn’t know it yet, but that sketch was the key to understanding why his firm’s new software acted less like a search engine and more like a senior partner.
They hadn’t just taught the AI to read. They had taught it to verify.
Note: Read the rest of the story on The Geek in Review Substack
