Marlene Gebauer 0:07
Welcome to The Geek in Review. The podcast focused on innovative and creative ideas in the legal industry. I’m Marlene Gebauer,
Greg Lambert 0:14
And I’m Greg Lambert. Well, Marlene, you look a little healthier than you’ve looked the last couple of weeks, so it’s good. Good to have in Dallas.
Marlene Gebauer 0:20
I was not doing too well. Last last week. You know, I caught the dreaded COVID. But first time but thankfully, I mean, it only was about a week and
Greg Lambert 0:28
And you thought you were he probably thought you were special. Right?
Marlene Gebauer 0:31
I thought I was clear. I’m not apparently the cure.
Greg Lambert 0:38
Well, we have a great episode. Today we have a friend of ours, Damien Riehl, from Fastcase, and SALI is on with us. So Marlene, I think I took your line, didn’t I?
Marlene Gebauer 0:50
You totally took my line. But we I would like to welcome you Damien. Damien is the Vice-President of Litigation Workflow and Analytics content and part of the leadership at SALI. Damien, we’re very excited to talk about SALI. We are we are longtime fans. And but we’re also going to talk about legal tech music and more. Definitely, you know, it’s gonna be a great podcast when you say and more.
Damien Riehl 1:14
I’m so thrilled to be here. Thank you so much for having me.
Greg Lambert 1:17
Yeah, this is one of those where we have a general outline, and we’re just gonna see how the conversation goes. So Damian wanted to start off first with talking about SALI, we’ve we’ve talked with about SALI standards before, but you’re kind of the, you know, I think you’re one of the biggest motivated people behind it and getting it out there and getting a lot more content. And on the back end was SALI. So would you just kind of give us an overview of what SALI is and some of the things that you’ve been working on lately.
Damien Riehl 1:49
Sure, Marlene, did you want to say something?
Marlene Gebauer 1:50
I was, I was going to echo that just like, you know, what’s the what’s the basic concept behind SALI? I think a lot of people might not kind of fully understand sort of, you know, what’s the purpose?
Damien Riehl 2:02
I love that clarifying question. And so think of SALI as a bunch of tags. So much like you go to Amazon, and you see the left hand view to be able to say, Okay, I’ve run a search for, you know, sports jerseys, right? So now I’ve run a search. And now on the filter, I want to say, in the filter. On the left hand side, there’s tags for men’s sports jerseys, there’s tanks for women’s sports jerseys. There’s tags for children’s sports, like each of those is something that Amazon use to tag up their products for words. So you as a user are better able to filter and analyze those things. So think of SALI almost like that, where SALI is a bunch of tags. But instead of sports jerseys and men’s clothing, women’s clothing, you can instead say, give me this area of law, patent law, and then give me the service. Am I giving advice for patent law? Or am I instead? Should I get a patent? Is it advice, if I decide, yes, I’m going to get a patent, then I file it with a patent or trademark office, that’s the registration service. And then I maybe licensed the patent, that’s the transactional service. That’s a different tack. And then I litigate the patent. That’s a dispute tech, and then I deal with patent assets in bankruptcy, that’s the bankruptcy service tech. So you can imagine if I tag up all of my matters, then I’m able to say, Okay, show me all the patents, things that I’ve done, whether it’s advice, or litigation, etc. And then I can find the light with like, show me other times that we’ve done this matter. Similarly, we’re also for what industries are we doing this work? It’s my clients, maybe an Information Industry, that is the counterparties in the agriculture industry, you could tag up with the agriculture industry in that. And then you can be able to say, Okay, what kind of documents have we done? Have we done merger agreements, motions to dismiss? Okay, that’s it another type of salary tag? Wouldn’t it be great to be able to catch all those? And then you can say within the document, is there a force majeure clause? We’re tagging that up? Is this a breach of contract motion? Is this a negligence motion? Is this a trade secret motion? We’re tagging those things up. So we have now have version two that came out in 2022. Um, had 1000. I know I’m sorry. 10,000. matters. It’s 10,000 tags of everything that matters to the substance of law, and to the also the business of law. We have jumped now from 10,000, the newest version that we’re building, it has about 13,000 tags. So we just keep marching through to be able to essentially take everything that matters to the subject of law, and to the business of law, tagging providing tags that then once you’ve tagged them up, really the value proposition is SALI’s now being adopted by Thomson Reuters. And SALI is being adopted by Lexis. I just had a meeting with Lexis today, SALI is being adopted by Bloomberg. SALI’s being adopted by NetDocuments in a big way, Sal has been adopted by iManage. So you can imagine each of those 13,000 things being standardized within each of those products. So an example and of course, I should say my employees
Greg Lambert 4:52
This gives you chance to save the Fastcase is also doing it right.
Damien Riehl 4:55
It’s true. Yeah. So So for example, today, we’ve tagged up all of our 225 SALI motion types motion to dismiss motion for summary judgment motion for 225 of those. We’ve tagged each of those motions up with the SALI identifier for a motion to dismiss. So if you send me a docket alarm, the SALI tag, give me all the motions to dismiss in the Southern District of New York for breach of contract, we would be able to give you all those back, send that same API query to Thomson Reuters eventually, and they’ll give you all that back, send that same API call to Lexis, and they’ll give it back. So this way, everyone is using the same identifiers for the same legal work.
Marlene Gebauer 5:31
So when you’re talking about Thomson and Lexis, when we’re talking about integration, I’m assuming into their their products? Are we talking about their information products? Are we talking more about their financial products?
Damien Riehl 5:45
So Thomson Reuters has said publicly so I’m saying that only what Thomson Reuters has also said publicly, but Thomson Reuters has said publicly that they are integrating SALI, for every single one of their products across just marching through. Okay. I can’t talk about the roadmap of the thing that they’re building. I know, you know, the products. But what not, but what I can tell you is that legal trackers, the first one that they’ve done, and so really the process for that is that Thomson Reuters and Jim Hannigan from SALI and I said, Okay, let’s say a legal tracker had 500 different fields and tags that they wanted to be able to map. So we just did one by one. We mapped all of them. But 95% of what legal tracker had SALI already had. And for that 5% that SALI did not yet have we push those into into SALI. So now SALI has 100% of legal tracker. Also, importantly, SALI had a lot of things that legal tracker didn’t yet have. And now legal tracker pulled in some SALI things that hasn’t they didn’t have before. So that’s the kind of bi directional improvements that we do with Thomson Reuters. Similarly, with NetDocuments is another one where they have said publicly at their NetDocuments conference in Denver, which I was able to be a part of. They have said that step one for them is I as a user could be able to take up my motion to dismiss breach of contract negligence, or my merger agreement that has a force majeure clause, I could do that manually, step one. But soon after that, this year, they plan to say, Hey, your document is a motion to dismiss for breach of contract, and also has a negligence claim in there. Do you want me to tag those things up programmatically? And the user can say, Yes, please. So this is a way to be able to essentially tag up all of your documents, take up all of your matters, take up your everything. Because once then you tag them up, you can analyze them, how much does a motion to dismiss in the Southern District of New York for breach of contract cost? Once you have those tags, you’re able to do that work?
Marlene Gebauer 7:35
And what’s interesting is like because it’s across the board in different organizations, and that one of the early arguments in favor of salary was that okay, this was across the board. Okay, everybody was using the same standard. And so you could compare apples to apples. And this, it sounds like there’s an opera, there’s a real opportunity to get this into firms in order to really make that happen.
Damien Riehl 7:59
That’s exactly right. And even so within firms, the problem that we’re solving for vendors like Thomson Reuters, Thomson Reuters, the person we’re working with that same as Cory, Cory, his boss said, Hey, Cory, we have 30 products that don’t talk very well to each other, you need to make them talk to each other. Cory said, Gosh, I’m an engineer, I’m not a lawyer, building a taxonomy for the laws really hard. And then he said, Oh, wait, SALI’s already built it. And it’s actually really good. So what is using is SALI is to be able to have all 30 of his products talk to each other. And the side benefit of that is then the API calls that, you know, he can open that up to law firms. So then say, if you want to hit any of those 30 products, then you’re able to use the SALI API query, and then be able to hit any of them. And of course, Doc alarm, same idea, if you hit us with an API call that is SALI, we’re able to do that. So Thomson Reuters is using it as a universal translator, both amongst their internal products, as well as external products. And then of course, as a law firms were also solving a problem for vendors like us, that each individual law firm might have their own bespoke taxonomy that has their crazy, you know, areas of law that nobody’s ever heard of before. So as a vendor, it’s a lot of work for me to connect with 1000 of my mappings to theirs. Wouldn’t it be better if I as a as a vendor, say, hey, law firm, you SALI, and then you’re out of the box, you’re compatible with us? So that’s the really the grounded idea.
Greg Lambert 9:19
I can almost guarantee if a law firm has a taxonomy, it’s a bespoke taxonomy.
Marlene Gebauer 9:24
That is, well, it depends it sometimes it’s client by client. And but if you can basically sync those using your, your existing products. That makes it a lot easier.
Damien Riehl 9:34
That’s right. And so one of the worker in tap is doing a lot of really important work with this. And what they’re doing is they’re trying to solve the problem that I just described. They’re working with, you know, hundreds or 1000s of law firms that each have their bespoke taxonomy. So they’ve built a really cool parser to be able to see what a law firm calls an area of law. So for example, they went through some of the largest law firms in the country, and they scraped from the website all of the areas of law that they have in marketing, one of those areas of law awk is appeals to the P tab. Okay. And they call that an area of law. But then the system that they built is to be able to extract that that is not an area of law, the area of law is patents law, because it’s the Patent Trial and Appeal Board. So that’s the area of law is patent law, the service that has been provided is the appellate litigation service. So that’s the appellate is the service. That’s tag number two. And tag number three is that the P tab is actually a forum. It is not an area of law, if P tab is a forum. So what this intapp tool is doing and rebuilding it internally, at this point, at some point, we’re going to release it to the world is you can input maybe 100 or 1000 rows of what you call a thing, your bespoke taxonomy here, appeals to the P tab. And then it will translate that to the SALI tags. Because you can imagine, appeals to the P tab as a single tag isn’t nearly as useful as the three tags of show me all the patent law things that we’ve done. Okay, now show me all the appellate things that we’ve done. Okay, now show me all the P tab things that we’ve done. Each of those as a individual is much more valuable than the lumped Frankenstein that is appeals to the P tab. So anyway, so that’s, that’s always saying that even if the Frankenstein monster that is your law, firm taxonomy, even if on the front end, the lawyer still says, No, I want to see appeals to the P tab as the thing. Cool, give them a button that says appeals to the PTAB. But then on the back end, tag it up with patent law, plus appeals plus p tab up so that’s the best of both worlds.
Greg Lambert 11:27
Yeah, I think one of the thing that when you talk SALI, especially to someone that just hears standards, oh, great, you know, here’s one more layer of work that we’re going to have to do. And we’ve learned from our iManage, or document management systems, that when you kind of make the person, the attorney or or the Secretary or assistant tag that it tends to either go in general, or we laugh is like if it’s tagged to a country, Afghanistan, tends to get a lot because it’s first on the list. So and we’re going to talk a little bit more later about some of the AI tools that are that are hitting the market. But and I know you’ve you’ve hit some of this with the way that some of the third parties are going to kind of automate the system. Is that how people like us should approach the powers that be at our firm when we talk about standards? Because I think to them, they just see it as Oh, God, you know, we’re gonna have to hire a boatload of people to come in and look at everything and tag it. And we’ll just do without it. So is automation. And are there other things that are going to be the way that these types of standards get implemented at the local level?
Damien Riehl 12:47
Oh, 100%. Yeah. So I would say that if humans are doing most of this, then the the organization is doing it wrong. That is a lot of this is automatable, in the ways that I described earlier to say, you could input 1000 rows and output the salary tags that go to that 1000 rows. That’s example number one. Example number two is actually within docket alarm, we have a pleading tag where we look at a docket, and it says, order granting motion for summary judgment. We recognize order as a salary tag, we recognize granting as the thing that happens, that disposition, and then we recognize summary judgment as a pleading within the SALI side. So each of those things, we use zero humans for that we just programmatically extracted, that order is a thing. Granting is a thing. And summary judgment is a thing. So it’s 100% machines. And it’s just a matter of vendors like docket alarm vendors like NetDocuments vendors like in tap, essentially building the tools to make the firm’s lives easier because we in the vendor side can build a scale. So for example, we can build, we adopt alarm had built that extraction order granting summary judgment, the precision of that you think about accuracy, precision, our precision is 99.6% precise. We think humans are about 96 or so maybe 97% precise. So our machine is better in accuracy at extracting these things than humans are. So all that’s to say that how many ways are there for it to express the word summary judgment? I would argue one summary judgment, right? Do you have to have some fancy GPT or any other Large Language Models to be able to extract the word summary judgment from your document or from anything? And the answer is no summary judgment to summary judgment, negligence is negligence. So really, of the SALI tags, almost all of them are verbatim. Sometimes there’s things like motions to dismiss, which in California are called dimmers, right? So in within SALI, we have motion to dismiss as a tag, we as a synonym of motion dismiss is dimmer, so the system can pull these things in. So all that’s to say that if humans are tagging these things up, you’re probably doing it wrong. Rely on a vendor and encourage your vendors to use SALI and to be able to pull the SALI tags out. Id even if you don’t use a vendor to do this. There. Easy ways for the machines to be able to extract things at a high precision and high recombinant.
Greg Lambert 15:04
How granular can the salary tags be? In other words, I assume time entry, I assume documents. What other kind of day to day task, or information that lawyers deal with, can SALI help organize?
Damien Riehl 15:22
A great example of this is the UTBMS, also known as the ABA task codes. That is the bane of every litigators existence, including me, I was a litigator for 15 years. And so when I was entering, I took a deposition. So I put in my time entry deposition, ostensibly, that’s to figure out how much the deposition cost. But as a litigator know, well, am I taking the deposition? Or am I defending the deposition? Because that’s going to be our immediately observing the deposition? Those are three different cost points. And then Is it a fact witness? Or is it an expert witness? Or is it a 36? Corporate witness? Each of those is going to move the needle as to more expensive or less expensive? And then is it a patent litigation? Or is it a slip and fall? Because that’s gonna move the needle? And then is it in New York? Or is it in die? Is it in Podunk? Nowhere, Isabelle, all of those things will really tell you things that the ABA task code that is deposition will not tell you. Each of those tags that I’ve just described, is a salary tag, we say are you taking the deposition? Are you defending the deposition? Are you merely observing the deposition? Is it a fact witness? Is it an expert witness? Is it a 36, corporate witness? And so you can imagine on a time entries, you could say, took deposition of experts so and so took deposition expert, cool, I going to tag up those three things. That is very granular. And now if I know that this is a patent matter that’s been used in the Southern District of New York, all of a sudden, I have much more granular data points that I didn’t have with the ABA task code that just said, deposition.
Marlene Gebauer 16:47
So I was curious, how are firms and maybe you know, counsel in house counsel, sort of using the SALI codes, I mean, pricing, staffing, maybe business development, what are some ways that that you’re seeing, you know, creative ways that you’re seeing them than being used?
Damien Riehl 17:07
A good example of Jason Barnwell from Microsoft, Jason is smart. In many ways. As your listeners know, he’s solving a particular business problem where two of his business units are being regulated by various regulators say the FTC or the FCC, we have tags for FTC, for FCC, etc. So his business units are tracking regulators. And then the second column of things that they’re tracking is the regulations, the Federal Trade to act, etc. And so he’s using SALI to be able to tag up all of the regulators and the regulations, and be able to take them up using SALI. And then he’s going to be requiring all of his law firms that work with them on those regulatory matters, to also tag up their matters, using the FTC, SALI tag, the FCC, SALI tag, etc. So that way, as they bidirectionally communicate, they’re able to use SALI as a translator between the firm system and the and the Microsoft System. So that’s just one example of how this is helping a client Microsoft communicate with their law firms. And then you can imagine if some of the documents that they’re creating is a NetDocuments, or an iManage, than iManage, essentially tagging up FTC and FCC in the same way. And so this is kind of the tri directional client plus law firm plus vendor, where everyone is using the same tag for FCC, therefore, you can be able to push and pull data to and from each other. Is it
Marlene Gebauer 18:25
impacted at all like time entry, like how people I mean, not not necessarily the tags, but like how people are actually writing their time?
Damien Riehl 18:33
It’s good question. And when I worked for Thomson, Reuters, I can’t talk much about the work that I did for them. But I can say you can imagine that there’s two ways to be able to create time entries that is eyes a lawyer type time entry. That’s my number one. Way number two is that the system writes the time entry for you. It knows what you’re doing, it sees that I managed his motion to dismiss for breach of contract in the Southern District of New York, and that could maybe populate a time entry friend. So anyway, so really, what is the time entry, but SALI tags embodied in a narrative text? Right? So one could imagine that, are we going to have narrative time entries where a human is actually writing things? Are we going to have extracting the tags from iManage? or extracting the tags from you know, the work that you do? And is the tag the thing that you care about? Less than the narrative side? Certainly, that’s all that’s to say, am I changing my the way that I write the my text to be able to help SALI, maybe, or maybe, maybe the systems you’re creating in the future, maybe your near future will essentially export the salary tags of the things you’re working on?
Marlene Gebauer 19:35
Well, I know men, many people if they didn’t have to actually do their time entry, and it would just be kind of done for them would be super happy.
Damien Riehl 19:42
It’s true. They’ll never totally that’s true. As long as the billable hour reign supreme, I guess happiness will elude us.
Marlene Gebauer 19:51
So Damian, we wanted to shift to sort of the the AI portion of the conversation now if you don’t want Mind. And we had Tony Thai and Ashley Carlisle from HyperDraft on a couple of weeks ago. And we were talking again about, and this was sort of right when it kind of broke. And we didn’t have quite as many examples where suddenly it’s like, oh, it’s only information from 2021. And then it cuts off. And oh, you know, we’re finding all of these mistakes and things like that. So, you know, they were also kind of cautioning about about some of this generative AI. And, you know, I’m wondering sort of what your take is on, you know, some of these, you know, shiny new tools that are going to that are entering sort of this day to day discussion.
Damien Riehl 20:40
I think that the GPT, in particular, and Large Language Models in general, like Bert said, on the Google side, Bert is also using lambda. So I’m going to use Large Language Models, or LLMs, as a shorthand for GPT, like things. So I would say Large Language Models are really transformative in a way that I as a skeptic of this is going to transform the industry. I hate when people say that. But now I find myself saying that on the three on this particular podcast, because I distinctly remember in late November, early December, sitting on my kitchen table, and my wife will attest that I said lots of expletives and said, This is going to change everything. And I’ve now tempered down by enthusiasm for this not going to change everything, but it will change a lot of things. And so some of the ways that I think it will help is Large Language Models. Yes, they are generative AI that is said that I want to create a text write me a poem, right, read me a short story. The problem is they’re also hallucinatory, as has been probably demonstrated lots that sometimes they speak the truth, sometimes they don’t speak the truth. There’s a you know, nobody else, sometimes right? Sometimes wrong, but always surgeon, right? So so the GPT is really, you know, sometimes right? Sometimes wrong, but always certain. So anyway, so that’s the generative AI side of it. But there’s also a generative extractive AI. So as as a Gen X, or I call that a Gen X. And so, so what that is, for example, here’s a paragraph of text, extract, summarize that paragraph of text. So that’s generative extractive, here is a bunch of arguments here, a bunch of documents create counter arguments from those arguments. So that’s extracting the arguments and generating counter arguments, create a decision tree, as to whether I should do this thing or that thing that a generates a decision tree, I’m classify, given this text, here’s three paragraphs, or here’s 100 pages, classify all the causes of action in that text, maybe with SALI texts, and then outputs that classification, create bullet points, that is summarizing the most important aspects of this four paragraphs that I’ll give you right now. So anyway, all of these things are not generative AI, they’re not writing a poem. They’re not give me a brief, right. But they’re saying, given these three paragraphs, or these five pages, extract stuff that matters, and then generate useful things that comes out of it. So I think that the most transformative aspect of it is not the generative side. But what I’m now calling the Gen X side of it. The generative extractive side, more extractive, generative, I guess, is the next gen, but it’s not nearly as fun as saying. But I would say that the generative extractive is the most promising because it’s really even today, it’s really, really good at that. And so that’s, for example, what we’re doing right now, Dr. Laurie, who just yesterday, we released something that said, if I, as a user, hover over a document and document, and I, maybe I, maybe I want to click on it, maybe I don’t want to click on it, I can now go to my GPT summary. And what it does is I click that button, and it runs that through GPT, and gives you the three bullet points as to what that document is about. So now I don’t have to read this 50 page documents if I don’t want to, or this 10 page document even now I can read through the bullet points to see if I actually want to read this document. So this is an example of this generative extractive concept where it’s extracting the document, giving a really good summary. And that is in a useful way. No one is going to accuse that of robot lawyering. And it’s not nearly as sexy as saying I’m going to represent you in court. But it’s really useful. And it’s really accurate. And so that’s the I think the what is transformative is the more even tempered, I given this, this text, give me the stuff that matters out of it.
Greg Lambert 24:19
Now, I know you did a test on and I think you posted it on Twitter, and wrote an article, a quick article about it this week, and where you did that where you gave it certain issues you and then you fed it and in more information and then said Alright, now that we have this, come back with x and it was just a process that you did. I’m curious. Did you have any naysayers attack you on Twitter yesterday when when you posted it?
Damien Riehl 24:50
I would say no, largely because I’d like to think of it’s because my logic is unassailable. That’s probably not true. But I would say that, that really if we think about human Things are really good Sunrisers we have really good AI between our ears, right. And so the the example that I gave in the LinkedIn is essentially using the AI of humans. That is I as a human want to summarize this 50 page document. And so I summarize it in a way that’s human readable. That is called the table of contents. And so the use case that I used that I showed in LinkedIn is I gave GPT Table of Contents appropriately for the open AI lawsuit that’s going on right now, open AI is being sued by a bunch of coders. And they’re being sued over GitHub copilot, which ingested the entirety of GitHub, figured out how code works. And now I as a coder, using GitHub can be able to essentially output code by saying, hey, rights, read me a website code that extracts, you know, that scrapes content from this website, and it’ll generate that code programmatically. The way that it can do that is because it’s already ingested the entirety of GitHub to figure out how code works. So GitHub is being sued right now saying that the coder said, Hey, we released this as a license, you breached that license, therefore, now you’re not you’re not supposed to be able to ingest the GitHub. So that’s what the lawsuit is. So I use that lawsuit, table of contents for the motion to dismiss from open AI, where I ingested the arguments from the table of contents. And I said, Okay, ChatGPT, create counter arguments based on the arguments, take these arguments in the table of contents, and create counter arguments from this table of contents. And then it created counter arguments to be able to say, and it gave me really good ones. So the irony is that I was using open AI as GPT, to argue against open AI as litigation of GPT. And essentially using the tech that they’re being sued over. So there’s, there’s a fun irony there, then on top of that, so that’s really cool that you created the arguments that are condensed, and you know, the cool AI is that we that brief writer, route 50 pages with content, and then said ba 50 pages is hard to grok. I’m going to summarize that in the table of contents, right. And so essentially, we’re using the human AI, which is now I’m going to condense 50 pages into a page. And now we’re taking a robot and saying, Okay, now take this human condensed table of contents, and write counter arguments based on that human condense compensation. So then the counter arguments turned out to be really good. And then I took it a step further, and said, okay, for each of those counter arguments, provide facts that can be used to demonstrate each of those counter arguments. And opening, I give really good facts that I as a lawyer, could if this gets past summary judgment, use in my depositions, using my document requests, use in my any type of legal proceeding to be able to essentially brainstorm. This is not robot lawyering. But it is a useful thing that, uh, how long would it take a first year associate to write counter arguments to a motion to dismiss? How long would it take a first year associate to be able to figure out facts to be able to extract to be able to point to, to show how those counter arguments are wrong? a while, right, but it took me less than a minute to do this.
Greg Lambert 28:06
When you say facts, you’re talking about a factual scenario, not necessarily that it’s going out in giving you facts of this particular situation. Right. And but that’s exactly right. Because I think that’s where it tends to fall down is when you ask it to give you specifics.
Damien Riehl 28:23
That’s right. Give me Give me a factual scenarios, I think is the is the what I had asked for. So not not things that have happened, but things that could happen. And of course, that’s you know, those are things that I asked my client, hey, has this thing happened? Okay, number two is that thing happened? Okay, number three, has this thing happened. So this is ways that I can explore with my clients, to brainstorm with them as to scenarios. And those all of those are really good on the output side.
Marlene Gebauer 28:45
So my eldest son, and I had a discussion about this the other day, and, you know, right after this, this all broke, maybe, I don’t know, a week later, there was a student that basically came out and said, Okay, if you write a paper using ChatGPT, like this, this, this thing that I built can tell you whether or not you did it or not. And of course, my son was very upset about this, because it’s like, why is he ruining it for everybody? And he’s in college? I don’t understand. And and I said, Well, you know, that’s one way of looking at it. I said, But you have to learn how to write you have to learn how to craft an argument, you have to understand how concepts go together, and how to sort of move from from from idea to idea before you can sort of use these shortcuts. And I want to go back to sort of what you were saying about, hey, you know, it would be a lot take a lot longer for an associate to do this work. But what are we doing by essentially cutting out that opportunity for them to learn in that way?
Damien Riehl 29:48
That is really the question that I’ve been talking about with my wife who is a professor of English. So she’s been teaching composition for now 25 years, and so she input a bunch of the queries to say He writes, a comparison of The Bluest Eye, and the color purple can take two characters in their experience and talk about their experiences with race and how they both are similar and dissimilar. We wrote that prompt into ChatGPT. And the output was like a minus work. And that’s with zero work. Right? So then, so she kind of had this existential question like, What am I doing? What have I been doing if a machine can essentially output pretty good stuff? And so I think about a lot like, you know, we teach people how to be able to multiply and subtract and add, right? Those are things calculators can do. So there is the idea that the knowing that what is behind the calculator is important, because you want to know the concepts. But mostly, you’re teaching them by doing that, how to think that how to think and ideas. Once you get to a certain point, you get people calculators, and say, Okay, now that we have that foundation, let’s jump faster, stronger, better. So applied then to English and to ideas and lawyering, we now have this tool that gives us a really good head start. And so our jobs now as humans, maybe we can use that as a way to leapfrog much like a calculator is leapfrogging and now I can use my critical thinking skills, number one, to be able to say is what I’m seeing accurate. That is, I’m not going to accept at face value when it comes in, I’m going to use critical thinking skills to be able to assess whether it’s accurate or not. That’s thing number one. Thing Number two, what is it missing? What are the things that are not showing up here? And then number three, how can I add to those things. So using it like a calculator to be able to say, Alright, now I’m going to use this as a first step, that now I can go faster, better, stronger. So an example from the LinkedIn post that that Greg mentioned, for this factual claim, open a eyes actions were the direct cause of place plaintiffs injuries, I said, now provide factual examples of how a Large Language Models training on text will cause an author of that training text to lose money. That was my prompt, it gave four examples, I’ll only go through one of those with you. The example that an output said that open AI used an author’s copyrighted work as training data for its large language model without obtaining permission from the author. As a result, the author lost potential revenue from licensing their work to other companies for similar uses. That’s a really, really good answer. That is essentially hurting open AI’s case. Right. So now I, as a brief writer, could be able to say, yeah, what, what open AI said, right? I can essentially mimic that. Or maybe I could say, you know, I gave four examples, I can use that as a jumping off point to say, Oh, now that made me think of example, five, an example six, right, it’s getting me to maybe the same place, I would have ended up maybe farther, but we won’t tell anybody. But really, the idea is that this is a way to be able to spur thinking, and to create new ideas, where maybe writer’s block, or a blank page might have kept us from those keeps making those new ideas. Without it,
Greg Lambert 32:51
Damien, that really kind of gives me almost to two things I want to think about on this net is, you know, never underestimate the Bar Association coming in and, you know, shutting this down. And then to you mentioned earlier about issues of copyright and things like that. So let, let me jump into the copyright part of it. And I know that that this is he an area of expertise for you as well. So, you know, let’s, let’s talk about that, you yourself are known for challenging the rules for for copyright. And in fact, you have a TED talk, where you talk about generating, you know, what some billions of, of musical connotations that put on a desk in the neuron to a hard drive and thus, have copyrighted that by doing so, what are agencies like the US Copyright Office? What kind of rules are they going to establish? Are they going to be able to kind of look into the black box of all of these Large Language Models and determine are you taking copyrighted material and using that without getting a license? What’s going to be some of the issues that we’re going to see and I think we’re gonna see relatively quickly.
Damien Riehl 34:14
So two aspects, actually a bunch of aspects from what you just asked, I’ll take you to one insurance. So there’s the the generative assets aspect of it, that is if machine created is a copyrightable. So this is currently been before the Thaler case where the Thaler had aI generated art, tried to file it with the copyright office saying, I’m not the author, the the robot is the author. Therefore, grant me registration. The Copyright Office says no, machines cannot receive copyright, because IP is a constitutional concept saying for limited times to advance the progress of the useful arts and sciences, you shall have limited times of monopoly essentially is what they’re saying. So, so that is incentive for humans to make new things. machines don’t need incentives. Machines are For example of BI all the music project, I hit a button, and it spit out at a rate of 300,000 melodies per second. We’re now up to about 417 billion melodies that are written to disk. Under the Berne Convention, once written to disk, they are copyrighted automatically. So the question is, now that I have 417 billion melodies written to disk, which ostensibly includes every melody that’s ever been, and every melody that ever can be, mathematically, I’ve exhausted the entire dataset is that now copyrightable to be able to keep this from using being used in nefariously. What I did then was putting everything in the public domain under Creative Commons, zero. And the idea is to be able to say, argument number one, maybe each of these melodies that the computer cranked out of 300,000 melodies per second, maybe that’s not copyrightable. And so that may be on original, therefore, on copyrightable, that’s the primary argument. So the same way, maybe the output of GPT is similarly on copyrightable because Machines do not need incentives to be able to create things. And really, if you think about what copyright is, really copyright is the right to exclude somebody else from doing a thing. So really, if I were to copyright my 417 billion melodies, I could say, Now nobody else can use those, I get life of the author, plus 70 years, or I in the case of an institutional author, 120 years. So now those are locked up for 120 years, or GP, TS creation is locked up for 120 years. That’s just crazy talk. Right? So anyway, that’s all the way of saying that, I think on the output side is the output of LLMs. And as the output of a machine is that copyrightable, I think the only sane way to come out is no, because otherwise, humans won’t be able to build anything, because the robots will cover the entire waterfront. So that’s on the output side. Now on the input side, if GPT ingests the entirety of the web, or GPT, and just the entirety of Wikipedia, which it has, each of those things is copyrightable. If you make a website that is copyrighted, right? Books from Google Scholar are copyrighted. So anyway, so the argument that the open AI case that the coders were making are, hey, I made code, my code is copyrightable, you ingested that code, therefore, you’re violating my copyright. But that’s actually not what they’re arguing in the open AI case. They’re not arguing copyright infringement, because there’s a bad case for them. And that bad case is the Authors Guild, the author guild in the Second Circuit concerning Google Books, where Google Books essentially scanned every book in existence. And the Authors Guild sued them saying, Hey, these are copyrighted, you can’t do it. And the court said, yeah, it was a copy technically, but it’s fair use, because the use of that book was transformative, because what Google was doing was creating an index of every book ever written. And then feeding back is a user to run a query and then get a snippet of that book and doing it back. And the Google Books court said, yes, even though you ingested every book, that snippet you provided, is a transformative use, therefore, fair use, therefore not infringing. So if creating an in Google Books, this case, they sometimes provide three or four pages of full text of the book, if that is fair use, then how about the GitHub, where they’re not reproducing three or four pages, if you know how Large Language Models work, they’re extracting the idea of a website, they’re extracting the idea of scraping, they’re extracting the code for each of those ideas. And then they’re combining it in vector space and outputting, a brand new thing that hasn’t been done before. But it’s merely taking the information in the world and transforming it into a new generative thing. So if Google Books is transformative, this other thing is certainly transformative. So that’s why they’re not hanging their hat on copyright infringement. But instead, what they’re saying is, oh, we have a license, that the MIT license or whatever license that they use for GitHub, and they’re saying you breach the license to do that. So they’re trying to essentially, they want to have a copyright like protection, but they’re using a license to be able to try to enforce that copyright like protection, because Google Books is a bad case for them. But then, of course, the the, the response to that is okay, if I have a book, and you’d have a seal on the book, and saying by breaking the seal on the book, that you’ve contractually agree that you will not be a Google Book to scan my thing. Of course, he can’t have a license to keep you from doing something you’re otherwise legally able to do. That’s just ridiculous. So essentially, that’s why I think that the open AI case is probably going to lose, or at least should lose if the court is doing it, right. Because all the machine is doing is what humans do all the time. I read a bunch of books, I assimilates those ideas from the books into a new expression of those ideas in my works. So that’s all the machine is doing in the Large Language Models. They’re ingesting copyrighted things, figuring out the ideas and how language works, and then creating new things just like humans would.
Marlene Gebauer 39:52
You know, we’re talking a lot about lawsuits and a lot of discussion as to you know, what is right and what is wrong. So, Of course, we’re going to be talking about regulation because you know, that’s not far down the road. Any ideas on you know how regulatory agencies are going to try and limit or expand or control these tools?
Damien Riehl 40:17
So I would say, for a tool is a tool. So a hammer is a hammer, I can use it to build houses, I can also use it to hit somebody over the head with right. One tool is useful, and others criminal, right. So really GPT Large Language Models like Bert, these are tools. So how do you use them? Is the real question. So is a hammer illegal? While it depends on how you use it? Right is GPT the unauthorized practice of law? Depends on how you use it. Right? So I would say that, if you’re a regulatory agency, think about another profession that was threatened by technology. So think back to the early 1980s. Think about CPAs what they did for a living, what CPAs did was have letters and books, and they added numbers together, and they figured out, you know what a profit and loss statement would be, all of a sudden a thing came by that just called spreadsheet, all of a sudden, all of these accountants are thinking, gosh, where’s our jobs gonna go? Right? That’s all we do all day is just add numbers together. And all sudden, in a millisecond, this thing comes out. Imagine if the regulatory agency creating for accountants had said, Hey, we need to kill this thing. Because otherwise we won’t have any, we won’t have any more jobs, right?
Marlene Gebauer 41:21
Hey, let’s make something better than Excel. If they did that.
Damien Riehl 41:26
Maybe Lotus 1-2-3 becomes fun. That’s right, Lotus 1-2-3. Exactly. I’m old enough to remember that too. But really what happened is the opposite happened, because accountants realized that once the clients figured out that it wasn’t going to take a week to get their numbers back, but would take a day to get their numbers back. They’d say, hey, accountant, how about you run these numbers with this scenario? Cool. How about this scenario? How about the scenario? How can I do tax to this, and they realize it’s actually more work. And we have way more accountants today than we ever have, largely because of the spreadsheet. So if your regulatory agent, is ChatGPT going to eat jobs? Or is it going to make us more efficient and better to be able to create more work to be able to feed the access to justice gap? That is really everyone’s talking about? No one can afford lawyers because it’s too expensive, because we’re too slow. And we’re too expensive. Can GBT make us faster, better, stronger, to be able to provide self help, or to provide, you know, load unlimited services using people with using lawyers, smart lawyers using GPT as an exoskeleton, like Ironman? Like Ironman? Exactly. It’s an exoskeleton to make you faster, better, stronger. And so really, that’s, I would say that any regulatory agent thinking about banning such things, think of it like a hammer, or like an Excel spreadsheet. It’s it’s a tool, and it might actually give our society more and better things than we have.
Greg Lambert 42:44
Alright, I’m gonna I’m gonna follow up on that and be a Debbie Downer, I guess,
Marlene Gebauer 42:50
Don’t Debbie Downer because he just was talking about how the way it’s like, well, it’s the way you look at it.
Greg Lambert 42:53
Well, yes, we have more accountants than ever. But just like with the expectation that a tool like this will help those at the bottom who need the help. I don’t think that that happened in the accounting industry, I think what happened was, that gave them more opportunity to take bigger, much more of the pie for their existing clients and potential, you know, larger clients. And so I you know, and that’s not necessarily a question, but kind of a statement. I would say that, yeah, that in the in the, you know, the perfect world that a tool like this would enable access to justice issues. In a world like perfect world, allowing non lawyer owned law firms in Arizona would mean that people would come in to service, the people who need help getting access to justice. And what it’s done is it’s actually brought in high end work into the state instead of the the middle and low end of the legal pay scale. You know, I kind of worry that if we’re looking at this is a access to justice issue. I would say history is not in our favor of this being an access to justice way to fill that gap. Maybe I’m wrong, hopefully.
Damien Riehl 44:26
As an analogy, though. Well, let’s think through our accounting example, of course, accountants often do taxes. So think about, you know, in the 1990s rolled around, and there’s a thing called TurboTax that now I’ve I’ve not used an accountant in my entire professional career, I’ve used TurboTax. So I spend 150 bucks and then get out the door. That, you know, that has access to justice for accounting, that is a self help that I don’t have to hire an accountant anymore. So now through the tools, I’m able to be able to do an unserved market, I guess, you know, so you can imagine on the legal side had a similar thing happening. And maybe we have self help tools that are better, stronger, faster, much like TurboTax is better, stronger faster than me just looking at the government website and trying to figure out how to fill out my taxes. That’s really hard, much like the court system is really hard today. What if a GPT like system were to go through trusted sources? Like, by the way, Fastcase and Docket Alarm, we have 700 million judicial opinions and lawyer filed documents that you could imagine, what if open aI had access to those $700 million legal documents to figure out even better how the law works? And then what if you set that free on access to justice initiatives, to be able to say I as a pro se litigants that I’m clogging up the courts right now, maybe I can draft a pretty good brief, based on real case law, based on real arguments that are statistically most likely to win for this judge for this claim, or for this criminal matter. This starts to take it away from these are things that are going to eat my job as a lawyer, and they are instead and maybe ways that we can be able to serve that 80% that haven’t been served with a way that will not clog up the judicial system, because the judges will actually have good arguments, rather than the pro se that we have today.
Greg Lambert 46:06
Daymond I know we could probably continue this conversation and we may have to have you come back in a little while and and just kind of follow up on on where we are. Because it seems like we’ve just gone leaps and bounds just in the little bit that has been 2023. But I’m going to have you look into the future. So pull out your crystal ball. And over the next two to five years. Where do you see tools like Large Language Models, whether it’s Google BERT, or open AI resources or or other products, and I’m sure are right around the corner. So how do you see that affecting later. And
Damien Riehl 46:52
I would say that it really is generative models, there are three ways to think about stuff they’re doing. And those three ways to think about it are largely analogous to what has been in the past. I anticipate that way. Number one is where I as a lawyer, say a bunch of stuff, but I don’t provide a citation for that stuff. We’ll call that a bullshitter. Right? That’s number one. That’s open AI today, they say a bunch of stuff. They’re not giving citations for the stuff. So maybe you believe a bullshitter? Maybe you don’t believe a bullshitter. But you look at them. We’re really, right. So that’s, that’s option number one. Option number two is we’ll call it a searcher. So this is somebody like a partner law firm says, you know, I know there’s law for this proposition, it’s out there somewhere, go ahead and find this law for the proposition. And you sort of read the brief. And so you essentially write the proposition. And you search really hard to try to find some support for that a citation that might be able to support that. So we’ll call that a searcher, right? You could imagine a Large Language Models doing that. So Large Language Models, like GPT spits out a legal brief, but no citations. Now, I have to be a searcher. So for each one of those sentences, try to find a statute that supports it or try to find a case that supports it. We’ll call that a searcher. That’s option number two. Option number three is a little more interesting, where we’ll call it a researcher. That’s where I, as a researcher, find the five or seven cases that are really on points and the statutes that are really on point. And then I extract from those cases, the propositions that matter, and the things that matter. And then I craft my brief around those cases and those statutes. That’s not something that GPT can do, right? I mean, it kind of maybe can, but this is something that humans have done. And we’ll call that a researcher. So for that, maybe that’s the right way to go about it, maybe that’s an option three, maybe we’ll go with option two, to be able to have it spew out things and try to call it on his BS, or not either prove or disprove the thing in option two, or option three, you Fastcase acquired a company called judicata judicata, assigned a unique identifier to every proposition, this thing is true. And also provided a unique identifier for every citation. That is every case, every statute. And so they created a product to say, Hey, why did you when you said this thing is true? Why did you cite this case? Were your side lost? Why didn’t you cite the 12 other cases that said this thing is true? Where your side one. So once you slap a unique identifier on the thing, you’re getting close to the researcher. So maybe a an AI like result is to be able to say I’m going to run a query for breach of contract in the Southern District of New York. And then maybe all the propositions that are statistically most common, get bubbled up. So now you have your most common propositions. And then you as a as a, somebody who is research says I like argument one, not argument two, but I like 345, not six, but I’ll pick seven. So I have a pick list of all my propositions. And then for each of those, I have a pick list of each citation. That is each statute that has that proposition, each case that has a proposition that has provenance, that you know where it came from, because you actually have hard cases. So between option two, which is a searcher, which you’re trying to deal with, is it bullshit There’s an apple ship, or option three, you have ground truth from the get go. i To answer your question crystal ball, I think that number three is probably going to win. Because it’s the way that we do research today. And you don’t have to deal with it, whether it’s BS or not. Yes.
Greg Lambert 50:14
Well, well said.
Marlene Gebauer 50:17
I liked that answer.
Greg Lambert 50:18
Well, Damien RIehl, from Fastcase and with SALI as well, like I said, we could continue this on for ever. So but we’ll just have to bring you back on. But thank you very much for taking the time to talk with us.
Marlene Gebauer 50:32
Yes, thank you.
Damien Riehl 50:33
I’m so thrilled to be here. I listened to every episode and I was so happy you asked me Thank you. So.
Marlene Gebauer 50:37
And of course, thanks to all of you, our audience for taking the time to listen to The Geek in Review podcast. If you enjoy the show, share it with a colleague. We’d love to hear from you. So reach out to us on social media. I can be found at @gebauerm on Twitter,
Greg Lambert 50:50
And I can be reached @glambert on Twitter, Damien and your Twitter and
Marlene Gebauer 50:54
where can you be reached on Twitter?
Damien Riehl 50:57
Damien RIehl first name last name da mi e NRIEHL. Power you can just search Damien copyright and I come right up.
Marlene Gebauer 51:03
Or if you guys are old school, you can always leave us a voicemail on The Geek in Review Hotline at 713-487-7821. And as always, the music you hear is from Jerry David DeCicca. Thank you so much, Jerry.
Greg Lambert 51:16
All right. Thanks, Marlene. And, Damien. Thank you very much.
Marlene Gebauer 51:19
Thank you all. Thank you. Bye