The Geek in Review Ep. 114: Pablo Arredondo on CaseText’s New WeSearch Tool and How the Neural Net Is Making Its Way Into Legal Information

By Greg Lambert & Marlene Gebauer on April 22, 2021

We hope that you like your Geek in Review with a little extra geekiness this week because we dive in with CaseText’s Chief Product Officer Pablo Arredondo on their innovative search tool, WeSearch. This completely unique method of indexing texts into what Arredondo calls a “sublimely complex, 768-dimensional vector space” creates a truly beautiful, and useful method of searching not just the words in the documents, but the concepts and meanings of those documents. Unlike the Artificial Intelligence tools many of us in the legal industry currently use, there’s no need to spend weeks or months training the system to understand the documents. The Neural Net techniques developed by the likes of Jacob Devlin, Google Researcher, and BERT author, allows the system to train itself, and the folks at CaseText have turned it loose to learn American case law.

While this new method of research opens many potential usages (and we brainstorm a few in the interview), but it also opens up some issues that aren’t unique to the legal industry, but are common in this industry. Issues such as acceptance of cloud-based utilities, what can and what cannot be accessed by the neural net tool, and perhaps the biggest issue we discuss, and that is the black box issue. Traditionally, when vendors provide search tools with AI and Natural Language Processing (NLP), there are Intellectual Property issues of the “Black Box” of the tool. While the methodology of how the system works is known by the vendor, just like the formula for Coke, it isn’t something they are willing to share. When it comes to this tool, the neural net and vectors work in ways that can be explained on a basic level, but after the system is trained, it begins functioning in a way that can’t be explained. This will be an issue that law librarians and academics may need to dive into in the not-so-distant future.

The WeSearch tool is available to test out. Let us know what you think.

Listen on mobile platforms: Apple Podcasts | Overcast | Spotify

Information Inspirations

We’d all like to know what “The Future of the Law Firm Office” is going to be after we begin entering a post-pandemic workplace. Texas Lawbook’s Brooks Igo is hosting an upcoming webinar on May 11th which tackles that very topic.

Jae Um gives us ten questions we need to ask ourselves on how resilient our law firms are as we come out of COVID. Resiliency was a key factor in 2008, and it will be in 2021 as well.

Law firms might be different than corporations, but our clients have a Customer Experience (CX) with us whether we think about it or not. In a new podcast launched by Accenture called Built for Change, the inaugural episode discusses the importance of CX, and how some companies have successfully pivoted how their customers interact with them, and make that experience better.

Law firms have an issue with the “NONs”… only this time it is non-equity partners.

Listen, Subscribe, Comment

Please take the time to rate and review us on Apple Podcast. Contact us anytime by tweeting us at @gebauerm or @glambert. Or, you can call The Geek in Review hotline at 713-487-7270 and leave us a message. You can email us at geekinreviewpodcast@gmail.com. As always, the great music you hear on the podcast is from Jerry David DeCicca.

Transcript

Marlene Gebauer 0:22

Welcome to The Geek in Review, the podcast focused on innovative and creative ideas in the legal industry. I’m Marlene Gebauer.

Greg Lambert 0:30

And I’m Greg Lambert. of Marlene. It’s been a busy week for us already. So as you remember, on Monday, we get to talk with Professor Rafael Figueiredo’s Legal Innovation course at the University of Houston’s Law Center. And we got to talk on the topic of knowledge management. And

Marlene Gebauer 0:50

Yes, we did.

Greg Lambert 0:51

Yeah. And although I think we, we talked a lot more than just km. And you know, and you know, God bless the students. We also went a little bit over our time. So

Marlene Gebauer 1:03

They had lots of good questions. They really were engaged. I was thrilled.

Greg Lambert 1:08

And I was happy that they, they didn’t just shut off their cameras and leave when the class is over. So thank you. You.

Marlene Gebauer 1:15

We had enough! We’re gone. Yes, thank thank you all it was it was a great experience. I really enjoyed it. Well, this week, we go back to a guest we had on our second episode, Pablo Arredondo from CaseText. We talked a bit about CaseText very interesting and creative neural net searching tool WeSearch a few weeks ago on one of our information inspirations. Well, it was inspiring enough for us to ask Pablo to talk more about it with us. And we’re going to put a warning label on this episode as “extra geeky.”

Greg Lambert 1:50

Yeah, this type of searching is really cool. And it was fascinating talking with Pablo about how it works. And the interesting thing is explaining what we know about the tool. And what we don’t know about the tool.

Marlene Gebauer 2:03

Yeah, I know

Greg Lambert 2:05

So stick around for that. But now let’s get to this week’s information inspirations.

Greg Lambert 2:14

Alright, Marlene, as of today, I am officially two weeks out from my second Moderna shot, and I am now officially fully vaccinated. Yay.

Marlene Gebauer 2:25

Very good. Congratulations.

Greg Lambert 2:27

Thank you. Well, you know, that’s good. But the problem is that so many of us, especially in the law firm environment, are getting our vaccines that now there’s this big push to start reopening the offices and to get back to the workplace. So how do we do that? And how do we do it safely? And, you know, quite frankly, how’s that even look anymore?

Marlene Gebauer 2:52

Well, I mean, I can tell you, I’m working on this now. And it looks different from place to place.

Greg Lambert 2:58

Absolutely. Well, Brooks Igo at the Texas Lawbook has an upcoming webinar on May 11, featuring an expert from Gensler, who will take up the topic of The Future of the Law Firm office. And this free webinar discusses professional services and law firms and the reexamining of the way in which we work. And so in this they’re supposed to discuss the importance of, of the law, firm culture, the physical space, and the transformation to a more highly mobile workforce in this post-pandemic world and it looks absolutely great and completely well-timed.

Marlene Gebauer 3:37

Yes, I agree. That seems like a very timely webinar, and I think will be quite useful for many people who are struggling to figure this out right now.

Marlene Gebauer 3:46

Well, Greg, the AmLaw100 is out and everyone is in a frenzy as usual.

Greg Lambert 3:52

It must be April.

Marlene Gebauer 3:53

Yeah. Revenue raised by 6.6%. And revenue per lawyer grew 4.8%. And profits per partner grew 13.4% thanks to pandemic-related cuts. But, you know, what’s the roadmap for continued success? Well, it seems to be resiliency, our friend Jae Um offers an in-depth analysis of key factors that identify what a resilient firm is, and how they behave. The resilient firms bounce back much more quickly than the pack after the Great Recession and kept on that trajectory. Jae knows that resiliency is not one size fits all but resilient firms deploy and reliably execute a coherent unified strategy of differentiation that considers its current market position and talent composition. And she provides 10 questions for firms that want to achieve resilience to ask themselves over the next decade. And here they are. Key clients, who are our most important clients, to those clients view our relationship in the same light? Demand Generation, what are the mandates and assignments we want to win in the next 12 months? How are we investing in order to ensure we are considered for the type of work we want to do? Commercial Awareness. Are we currently dependent on revenues from clients’ practices or service lines that are not sustainable? How prepared are we for the next unexpected crisis? Pricing Excellence. Do our current pricing levels accurately reflect our value proposition? What do we need to do to strengthen our pricing position? Practice and Talent. Which practice areas will be the most important to our success over the next five to ten years? Do we have the right talent in place to defend or strengthen our competitive position in our priority markets and practices? Now, Jae says in current market conditions, the vast majority of firms will need to invest time and effort to establish fact-based, data-informed viewpoints for at least three out of these five themes. Subsequently, the firm’s making headstart investments in data analytics and competitive intelligence will head into this recovery with a clear edge.

Greg Lambert 6:08

Well, my next inspiration is a new podcast I stumbled upon this week from Accenture, called Built for Change. So this is a brand new podcast. And the first episode focuses on something that I think law firms could use. And that is a better customer experience, or CX. So while the episode focused on companies that we’re producing a product, it really wasn’t about the product itself, but rather how those companies understood their clients’ needs and adjusted their business processes to better fit those needs. So one example was a company that quickly shifted from providing contact lenses through the mail, and switched over to actually providing women’s birth control via the mail because of the difficulties that so many of their existing customers were having during the pandemic. So they were a perfect company to make the switch over to this new model because they understood their clients and work to fill a void in the market. And while that may sound pretty far off, from what we provided law firms, I think creative minds can see the link and the parallels there. And it’s a very engaging podcast. And quite frankly, I think, if no one else enjoys it, I think you will enjoy it. Marlene.

Marlene Gebauer 7:27

Well, I will check it out. But you know, I like what you’re saying that they were really nimble about sort of just moving fast, sort of recognizing the opportunity and being able to sort of seize that quickly.

Greg Lambert 7:39

Yeah, never let a crisis go to waste.

Marlene Gebauer 7:41

That’s right. So is it good or bad that the non-equity partner tier continues to grow in 2020? It grew 6% surprising, right?

Greg Lambert 7:51

Yeah Maybe

Marlene Gebauer 7:52

I mean, since firms were cleaning house and pushing out unproductive attorneys in practice groups. Historically, having a large non-equity base was not a good thing. The perception was that it would promote mediocrity, because, you know, money is, of course, the only motivator. Right, Greg? You know, grow or go?

Greg Lambert 8:11

Yep.

Marlene Gebauer 8:12

I mean, if you didn’t make equity partner buh bye! Billable hours, billable hours have been on a steady decline for this group. And you know, if these types of groups aren’t managed properly, that can become a serious liability for a firm. But you know, the role has changed over time. And what motivates practitioners has also changed. In-house teams often handle much of the work associates do now and are looking to outside counsel for more specialized work which nonequity Partners can handle. There’s an appetite for nonpartner track roles with less pressure to bill and generate business. And cutting the tier is also risky because firms can lose valuable expertise this way. So McKinsey is an example. They’re looking at this closely. So if attorneys don’t want to go for equity partner, they can continue to grow their expertise and disseminate it through the organization. And I think in order to succeed in these sorts of models, yet, we really have to remove the stigma of being non-equity. Now, some firms like Sandberg, Phoenix, and von Gontard are trying to do that. Apparently, they have purposefully gotten away from the idea that you become something after a number of years. But I think this is going to be very, very hard to instill across the industry. And that wraps up this week’s information inspirations.

Marlene Gebauer 9:38

For many of us who consider ourselves to be legal information professionals, we have a method of searching we think helps us get to the right information quickly and efficiently. But we’re always on the lookout for better methods. It doesn’t mean our go-to methods aren’t still valuable, but it gives us another tool in the search methodology toolbox. Our friend Pablo Arredondo from casetext He comes in to talk about what will most likely be that new tool. And that is the power of neural networks and vectors on legal research.

Marlene Gebauer 10:10

We welcome back Pablo Arredondo, co-founder and Chief Product Officer of casetext. Pablo is actually our second guest on the podcast a couple of years ago, a lot has been happening since then, including our recording platform getting so much better. So we are delighted to have you back. Pablo, do you remember all those difficulties we had?

Pablo Arredondo 10:31

I don’t remember that. I just remember the good stuff.

Marlene Gebauer 10:33

He’s blocked it out. Well, while some people may not know, we have a lot of AI in our lives, but they’re all different types. So can you give us a brief rundown of the main ones that we see in practice support platforms for legal?

Pablo Arredondo 10:49

Absolutely, yeah, I think a lot of the excitement and energy around AI, over the last, you know, 10 years, had been on things that aren’t really directly related to the practice of law, nonetheless, quite exciting. So self-driving cars would be one example, facial recognition, certainly the ability to best the world champion at Go!, which is a game that was so complex, they didn’t think a computer would ever do it. And you know, all of this was well and good. But I always would feel a little bit sad because these amazing gains that were happening, and these amazing applications that were happening, really weren’t showing up in law nearly as much. And the reason for that is that there’s really a subset of AI called natural language processing, which for various reasons, it was a lot harder to get traction with. And you know, the main reason being that language is an extremely complex system, that actually getting enough training data can be more difficult with language, especially before some of the more recent breakthroughs obviated the need for them. And so, really, when it comes to lawyers, I think natural language processing is the area to focus on. And then, more recently, the advent of neural nets or the re advent of neural nets, I might say, because this is an idea and an approach that has a roller coaster ride going back decades, but which has really come to the fore in recent years. The use of and the application of neural nets to natural language processing is really sort of the epicenter. As far as I can tell of AI’s application to law, and certainly one of the epicenters of progress in legal technology.

Greg Lambert 12:30

Has it been such a long path to law because the lawyers have made it complicated? Or is it just the nature of this industry?

Marlene Gebauer 12:40

Or was it money like that there was they figured there was more money in other areas?

Pablo Arredondo 12:45

So this is one of these ones you can’t blame on the lawyers. And actually, it certainly wasn’t the money. I mean, the financial incentives to be able to solve language are massive, right? There are so many different ways that humans use language, leverage language, everything from translation to voice activation, etc.

Marlene Gebauer 13:05

Like, but I mean, I mean, like, you know, health like, like health care, like there’s always focus on health care or something like that, because there was a more perceived benefit to be had, I guess.

Pablo Arredondo 13:16

I don’t think it was just that. I mean, I think there were certainly people that focus there. But I really think part of it was just language had a lot of complexity and a lot of issues that made it particularly challenging. And so while none of it, I would call like low hanging fruit language was sort of the highest of the hanging fruit in a lot of ways, and in some ways still remains. It’s certainly not a solved problem.

Marlene Gebauer 13:37

So it was hard.

Pablo Arredondo 13:40

Very, very, very hard.

Greg Lambert 13:43

It was a long trip around the block to get to “it’s hard.” Well, Pablo, I talked to you a couple of weeks ago, about your new platform, we search and it uses the neural network that you were just mentioning to read the documents. You know, we’re going to get into the details here. But can you just give us like a 10,000-foot overview, of how WeSearch works?

Pablo Arredondo 14:12

Sure. So there, like we mentioned, there were these fundamental breakthroughs a couple years ago that actually happened out of Google, within the neural net family, there’s something called transformer-based neural nets. BERT is the acronym for the big breakthrough, which was a shout-out to earlier technology about ELMO. And yes, Marlene, there is actually a BigBird out there. I can send you an article on that as well. Excellent.

Greg Lambert 14:36

She wants this guy he wants an Ernie

Marlene Gebauer 14:39

I want an Ernie too. I’ll take a Big Bird.

Pablo Arredondo 14:41

They’re having fun. Yeah. And there’s a guy who many listeners may not have ever heard of them Jacob Devlin, but I think everyone in legaltech owes him a lot of gratitude because he and his colleagues broke through one of the major walls that was a limiting how well we could capture language and capture the nuances of language using these neural net techniques. At casetext, you know, we have a lot of evolutionary pressure on us to be aware of these breakthroughs, because it helps us sort of punch above our weight in terms of competing with Westlaw and Lexis, without, you know, as large an army of editors. And so we were well positioned to kind of see this into immediately adopt it. And what it does is basically, instead of your documents, whether it’s case law, transcripts, whatever it is that you want to use as your documents, instead of those being stored as a flat, rather dismal keyword index, which is just essentially a spreadsheet with words as rows and documents as columns, instead, you’re now placing language in your documents into this sublimely complex, 768-dimensional vector space. And, I mean, forgive me, I’m obviously biased in some ways, but it’s truly beautiful to think of, you know, going to this new substrate, if you will, for law, or for language generally. And in that realm, you’re able to match things, even if the words don’t have any overlap. And you’re able to do that without having to say manually try to, you know, hard code, as many synonyms as you could think of, or do some of the earlier things that we did, which, though, sort of got to concept matching, were very crude and very limited. So, and you had to ask for 10,000 feet, and I realize I’m like right here in the, you know, one-foot realm here talking about this stuff. But what WeSearch basically does is let you take the power of these breakthroughs, and easily point them at whatever you want. And so the slogan I’ve been going with is to make it as easy to spin up a neural net as it is to grab a yellow legal pad out of your desk. And that’s really all WeSearch is putting into lawyers’ hands, this qualitative leap forward and how language is stored and therefore, how language is searched.

Marlene Gebauer 16:21

So, Pablo, you mentioned that we search vectorize’s the law. So you know, while it’s cool to identify the hundreds of vectors when analyzing a document, it isn’t really useful unless there’s something to point those vectors in a relevant direction. So how does that work?

Pablo Arredondo 17:11

Absolutely. And this is one of the coolest things that I’ve come across in my you know, currently.

Greg Lambert 17:16

So we’re about to totally geek out, I can tell you,

Pablo Arredondo 17:18

yes, you got

Marlene Gebauer 17:20

everybody red alert for grid. Right now, just be prepared.

Pablo Arredondo 17:26

You know, and let me set up by kind of contrasting it with what a lot of folks who’ve been in this area for 10 years have been used to when people would come peddling and talking about their AI stuff, right? What they would come and say, hey, look at this, watch this, all you need to do is spend 30 days tagging which documents are responsive and which ones aren’t. And lo and behold, the AI will learn that there’s a correlation between the phrase Athena industries and being responsive, and there you go, you’re off to the races. And that was sort of how you would point things in the right direction, very manual, very intensive and very limited. Okay, so contrast that with the breakthroughs that have been going on with the neural nets, you start with a randomized set of vectors, you can think of it as basically a bunch of arrows pointing in random directions. Only those directions are 768 dimensions that they can point to. And what you do is you just make it play a simple game, and the game is fill in the blank. So if I say to you guys, the woman went to the store to buy a blank of milk. We know language, we’ve learned English, we’ve, you know, millions of years of evolution developed whatever neurological structures, decades of actually practicing and learning language, we might say, a carton of milk or a gallon of milk, right? different candidates. But we’re not going to say an aardvark of milk, we’re not going to say a roller coaster of milk, right? The first time the neural net plays the game, it does guess roller coaster, right or aardvark, right? It completely gets it wrong. But it’s got that grid, it’s got that stick to it this because we tell him has to. And it just keeps playing the game again and again and again. And when it stumbles on a right answer, it doesn’t forget, which is to say, It adjusts its vector space, the vectors in that in that multi dimensional space are slightly pointed differently, so as to benefit from, Hey, I got that, right. And if you make it play enough, and you have to use special chips, computer chips to really be able to do this, it can take weeks sometimes to do the training on the back end. But again, there’s no human labeling here, right? You literally just say play this with the common law, we’re just going to take the entire common law, split it into sentences, and you just have to keep playing this game. And lo and behold, in order to get good at that game, it starts to understand the way that not just how words are related, but the way that how words are used in a given sentence, right, because to play that game, it’s not just enough to know what the definition of the word will is. You have to know that there are multiple definitions of will and which one is the likely one depends on the words that surround surrounded in the sentence.

Greg Lambert 19:58

Not to get too deep here right at the beginning, but how does it know when it got it right?

Pablo Arredondo 20:05

Oh, right. So there’s just a correct answer, because we started with real sentences where we switch the word gallon with the word mask. So there’s a there’s an answer, there’s an answer sheet at the end, right? I mean,

Greg Lambert 20:16

like, unlike the AI, where we’ve we have to train it over 30 days that we’re

Marlene Gebauer 20:21

where humans are looking at don’t say yes, right, wrong. How does it know?

Pablo Arredondo 20:25

Exactly. No. This one, there’s a very clear right answer. There’s no inter coder. disagreements, like we’re one attorney says, I don’t think that’s really relevant. And everyone says I do the right. This is this is almost mathematical in terms of it. And that’s what lets you be able to do it in a massive way against, you know, with the Google one, I think, you know, all of Google Books and some huge portion of like the internet. In our case, we trained it on the entire common law. It’s only because you don’t have to have humans involved, that you can do that sort of really massive training. It’s truly wonderful. And in the results, when you use it are for anyone who’s sort of done a lot of search, you have this almost eerie sense of like, how did it know like that those two things are the same, right? There’s no overlap in the work?

Greg Lambert 21:10

How does this compare to some what we may think of is similar tools out there that handle concept searching in research and e-discovery?

Pablo Arredondo 21:21

So not surprisingly, given just how profound a limitation, keyword search is, there have been a lot of efforts to try to overcome that in various ways. And some of them are, you know, not incorrectly put under the umbrella of concept matching. So one would be to say, well, it’s hard, you know, let’s take it the source and basically stick it on top of the thing, right? And so at least there if they use one word, you know, you know, run or sprint, we can kind of like make it do it that way. But of course, that’s that’s limited. Because language, there’s a lot of ways that things can be the same concept, even if the individual words underneath them do not line up, right. So this scares me a bit. And I find this unsettling, right? As humans, we’re kind of like, yeah, they’re getting at the same thing, we kind of get that right. But scare and unsettled might not be in the thesauras. So that was one approach. Another approach was to sort of look at statistical co reference, right. And this was where you’d say, hey, all the documents that you’ve called relevant, do happen to have the word Athena in it, right. So we can kind of, if they also have Athena, and they have the word patent, then if somebody searches pen, maybe they also kind of want Athena, right? They didn’t say Athena, but there are words that are that appear alongside Athena a lot. And so we’ll call that a match. And then there were things like latent semantic analysis, things that we actually used in the very early days of CARA, where if I take like, if I give you all the New York Times, articles ever, you can give them machine that, they all the New York Times articles, and say, split these articles into 30. piles. And it can look at the statistical distribution of the words and sort of come up with one pile that’s like, touchdown, football, cheerleader coach for a bunch of sports words, right, and then one pile, that’s a bunch of, you know, political words. And so to some extent, you can kind of pull concepts out by using by using that. So these were all, you know, good things to do. But I think the reason everyone is so excited, and the reason why these newer techniques are kind of wiping the floor with all these earlier ones is is that they’re they’re both utilizing a much more complex fabric, if you will, to kind of put things on to, and they’re using techniques that can scale up in a way that earlier ones really couldn’t.

Marlene Gebauer 23:34

So we’ve heard a lot about, you know, garbage in garbage out when neural networks learn. And since the, you know, algorithm is proprietary, you know, we can’t figure out where it makes mistakes or where the content creates an environment where bias is reflected. So first, tell us a little about debiasing research. This is a new area of research apparently, and and tell us what efforts casetext has taken in this regard.

Pablo Arredondo 24:05

Okay, so this is a very important and broad topic. And without question, the neural net that we trained on the common law reflects bias of various forms that it learned from just doing this game with all of the cases. And I found if you search Pablo went to the car, you get different results than if you did Greg went to the car. And the police are in the public one. You know, things like that can happen. Yes, I know. And it gets in some ways worse because unlike our traditional engines, where I could sit with our engineers and say, how much weight Are you giving to the site counts versus how much you’re giving to the document library, you could really open the hood and really see what’s happening. With the neural net it’s well beyond our ability to comprehend or visualize exactly how it’s doing what it does. So that’s where we’re starting and I the best I can do is to work bring us back to our earliest days as a species, or some of our earlier days as a species, the shaman would come out, you’d say I have a rash, he would take a berry, he would rub it on your rash, and the rash would go away. And you had no idea what was going on there, you had no idea what the molecular mechanisms were, you had no idea, right? All you knew is that it did make the rash go away. And in some ways, our relationship with these neural nets right now is something similar to that, right? We just, it’s doing things that are very valuable, it’s helping us find things that we’re looking for. But the precise mechanism, we don’t really understand, right? But I think obviously, people aren’t fully satisfied with that that’s not the ideal world, right? You want to know what’s going on. And so there’s been a lot of effort to try to peek inside and understand which factors are being leveraged, you know, sort of how is it kind of arriving at the conclusion that it is, there’s some work in terms of what training data do you use? And can you use that to help assess bias, right? There were filters that were supposed to be able to detect adult images, but they started saying anything that has anybody who’s black, it’s an adult image, because the way they just fit all of it, there was a higher statistically likelihood of that, right. So obviously, that’s a problem where you know, garbage in garbage out, just just using a large set of data, it’s gonna reflect whatever biases are inherent into that data right now. The truth is, what we’re doing at CaseText, in terms of our reference is staying appraised of the efforts of others. And to the extent that we can bring those in we want to, it’s, it’s it’s a very deep research issue. And it’s not one that we can do concurrently with all the stuff we’re doing, you know, in terms of actually building products for lawyers.

Marlene Gebauer 26:38

But there’s, I mean, there’s actually, but the point is, is like there’s experts in this area, and you know, you’re watching what they’re doing and saying,

Pablo Arredondo 26:47

absolutely, yeah, I think anytime we can gain insights into it, we want to do it. But the truth is, right now, you’re searching with a black box. And it’s not a black box, the way like the recipe for Coke is a black box, that somebody could go upstairs and open up the safe and have it. Like we couldn’t open up the safe we wanted to right now, that’s the price of using this these techniques.

Greg Lambert 27:10

Well, I’m a little scared now. And that’s what you know, that’s one of the things we hear, especially in you hear in the academic world right now. Because they, they have the time to sit and think about this, they have the broad latitude to research what’s been going on. And I think a lot of us felt like it was more the black box was more like the recipe for coke that the companies knew what was going on. They just didn’t want to share it, because the the intellectual property behind that. But I can tell you, that if if you don’t know what’s going on, then, you know, I think that shines a different light on how far behind and how willing we are to let the technology kind of lead us on this, which can be good, can be great. You know, but at the same time, there could be some unintended consequences that you don’t know until they happen. So that’s so it was just just kind of caught me off guard there with not knowing what was in the black box.

Pablo Arredondo 28:23

Yeah, and you know, prior to the neural net stuff, it was like that it was a recipe for Coke, right? Like there really wasn’t a weight that we were giving based on how long a document is. And you know, that would be multiplied by a weight for how many times it’s been cited, etc. So it should make law librarians bristle to hear somebody to say, Sorry, it’s a black box, and there’s really nothing anyone can do about it. And then you just have to weigh that against against the benefits and against the fact that you know, from the clients perspective, a more efficient and more effective attorney is something that they it’s palpable to them to see that benefit. And whether or not they understand the underlying mechanisms of the tech, you know, is a different question. But it’s certainly something that we should all be paying attention to and thinking about, and hopefully we’ll get better.

Greg Lambert 29:09

And speaking of benefits, this tool is the neural net is a cloud based system. Is that am I right in that?

Pablo Arredondo 29:19

Yes.

Greg Lambert 29:20

Okay. And so and I know that you’ve had some interesting views on cloud adoption, or can you give us a little bit of background on that?

Pablo Arredondo 29:30

Well, I mean, look, I don’t want to purport to be an expert on info security. And, you know, tell folks that their business on this I it has, I have noticed that there have been some very massive security breaches through on prem systems. And I think that much more knowledgeable people on this than myself are starting to raise the question of whether or not the cloud is in some ways more secure than to be off of it. And you know, the nature of this kind of tools, because they use special processors and special chips, you know, GPUs, sometimes TPUs, the cost of going on prem is getting increasingly high. And I think that trend will continue so that the, you know, the loss of value, if you’re not willing to work with the cloud, could grow as these techniques that are very computationally intensive, continued to develop. But you know, you know, this is a bias guy on this, obviously, we have solutions that do rely on the cloud. Some of our AmLaw firms are very forward on this and very, you know, very open to working with the cloud, obviously, with a lot of assurances that we go through in their security protocols. Others seem to just hear the word and think, you know, that we’re talking about putting it all on Twitter.

Greg Lambert 30:47

Yeah, I can, I can see that as well. What do you see? I mean, are you seeing that there’s kind of this blending of on prem and cloud? And do you see eventually, that pretty much everything is going to be cloud based?

Pablo Arredondo 31:01

Yeah, I mean, I think the majority of it will be there might be certain aspects for something on prem, you know, make some sense in an isolated way. You know, virtual private clouds, I think they’re within cloud, there’s a lot of different layer, you know, sort of levels. Yeah. You know, Amazon has, I believe, a HIPAA compliant cloud that’s designed just to be HIPAA compliant. So I think the word cloud is going to come to mean different things. But yeah, I would think that that’s where things will be headed. Like, you know, the massive solar winds hack, right was an example of one that being on prem. But I don’t know if this is allowed in the pockets. But let me ask you guys are much better situated to talk about where things are going relative to the cloud that I am. So I don’t know if this happens on this podcast. But could I turn it around? And Greg, Marlene, what are your guys views on on the cloud? And both in this sort of short term and the long term?

Marlene Gebauer 31:51

No comment!

Greg Lambert 31:53

Well, I can comment,

Marlene Gebauer 31:55

I’m a fan.

Greg Lambert 31:56

And I can tell you that maybe three, four years ago, if you said, Hey, I want to look at this thing, it’s cloud based, the immediate answer was no. And so we’ve just morphed over time, and I would say it, you know, is, it’s really interesting, because of the fact that the, all of the patting of the IT folks on the back for being able to get people to work remotely, basically over a weekend, was because we’ve spent 10 years pushing things out to the cloud. And instead of having a gigantic internal network that you had to, you know, I remember when I worked at the University of Oklahoma, and I had to dial into the mainframe, you know, to do an update to do a backup at, you know, at midnight or two o’clock in the morning. And, you know, now I’m able to sit in at the, you know, basically at the foot of my bed, and do everything that I could do in my office with the exception of walking across the hallway, and getting a can of soda out of the out of the fridge, I gotta walk downstairs, now to do that. But, you know, my telephone, my computer, my network, my file system, iManage all of that is, is, you know, in the cloud, and it’s that gradual shifting that’s allowed us to do that. And, you know, it was the same thing, as you know, you hear years ago that, you know, the most insecure thing in in the building is the people that are in the building, because they’re, you know, that’s going to be where you’re going to fail is because someone is doing something, either unintentionally or intentionally, that they shouldn’t be doing. And that server in the, you know, in that room in the in the hallway that you know, you get security on, all somebody has to do is accidentlly not, you know, not shut the door all the way, and it’s exposed. And so yeah, it’s a it’s a different world now, in 2021, than it was in 2011.

Marlene Gebauer 34:19

Yeah, I mean, the one the one thing I’ll say is it’s sort of this one last kind of holdout is where there’s like sensitive information or client information, and there’s still I think, you know, some hesitancy there, because of, of, you know, our ethical obligations. And, you know, but But that said, I mean, I think that Greg is right, that it’s, it’s being considered as opposed to absolutely not. So, you know, the conversation is happening and, you know, I know, firms are looking at, okay, if we maybe we can do Prem to cloud or something like that, as opposed to you know, to strictly on prem. So, you know, it’s it’s opening up. And I think fortunately or unfortunately, I mean, the COVID, you know, crisis allowed that. And, you know, folks are kind of seeing that, alright, there’s an ability to do this, it’s just, you know, we have to be comfortable that our data safe.

Pablo Arredondo 35:20

Absolutely, I knew we wouldn’t want to live in a world where law firms are not very careful, and spend a lot of time thinking and looking at all the possibilities when it comes to their clients information.

Marlene Gebauer 35:31

Alright, so I’m gonna go back to vector searching, because I think this is really interesting. It’s like, what are the pros and cons of vector searching? And you know, and here’s, here’s the thing, just like I do this from from past history, what’s the best approach to win over those who think they can craft a better search using keyword or or Boolean, and you know, honestly, it’s like, I remember the CaseText’s bake offs from some of the conferences, and I really feel like, you know, we have to do a Bake Off of the Boolean expert versus the neural network to see who wins.

Pablo Arredondo 36:05

Yeah, anyone who would like to do that with us go ahead, or you could do it on our tool by using you know, we offer both types of search. You know, but but it’s good. To be clear, there are times where one tool might be more appropriate than others. There’s an old New Yorker cartoon I love, which is a guy standing in front of his microwave, and he says, No, I don’t want to play chess, I just want you to reheat the lasagna, right? And there are some times with a certain query where you’re like, I don’t need you to get fancy, I don’t need you to show me that you also know that this other thing is related. I’m basically trying to fetch documents that have that certain keyword, right? And so, you know, there are certainly times where, you know, it’s it’s overkill to use a neural net, and where neural net might volunteer things that though, hey, look how snazzy you are for finding it. I don’t want to see that document right now. And in fact, one of the things that we’ve done with parallel search on case law and actually with research now is let you do a hybrid, where you can type in a sentence and say, I want to do concept matching overall, but this particular word has to be in the answer, right? And so that way, if there is something you want to lock in on, you can do that. But you’re everything else, you’re allowed to, to use the more robust matching, which is often very important, because you’re not going to, you know, remember, every single word is something might be articulate. So I it’s not something where I think we should throw away Boolean searching at all, or, you know, brief analysis, like CARA and stuff like that. They each have a place. But what we’ve found is people tend to really want to start with the neural net, and then go to these other ones. If need be. That’s been our experience.

Greg Lambert 37:38

Well, I want to go to the other extreme, and think of all the crazy stuff we could we could be doing with this and because after after I talked with you a couple of weeks ago in and get to get to play around on it. I was I think even while I had you on the phone, I was like, Oh, my God, you could do this. What about this thing over here? When I got to talking with people inside? They’re like, Oh, you know, this big problem that we’ve had for years, we should throw this at it and that will fix it. So what are some of the ideas that you think that this type of tool could be applied to?

Pablo Arredondo 38:17

Absolutely. And one of the really fun parts about kind of running around town with this as their little song and dance is how many cool ideas that we did not think of having coming up? Right? So you know, when we first heard it, I was like, Look, you’ve got your obvious one search a brief bank, right? That’s a no brainer, right briefs are similar to cases in terms of their, you know, informational texture. Transcript, right, there was some sort of some some sort of ones that came to mind. We had one client who is National Council for a large automobile company, and does basically discovery for these this company across like the whole country, going back years and years and years. And one of the things they want to do is make sure that their discovery responses are consistent, right, they don’t want to have said something way earlier. But you know, each litigant doesn’t write their interrogatory or their request for admission in the same way. And so one of the things they found it really powerful to do is to just build a take a set of interrogatory, plug it in and let the neural net say, Hey, I know that here they said manufacturing plant instead of factory, but don’t worry, we get it it’s the same question. Right? And so that’s one use case. Just today somebody they’re going to be using it on there. They call them PTIs, the part of the interruptions back in my day, it was request for information that that is how we call those but apparently there’s they’re getting more polite about interrupting people. You know, sort of the ability to when somebody says Has anyone been before judge on so right to sort of like find all the people talking about that judge separately from how they might have exactly set it? We got some partners that be like don’t tell anyone I’ve got a C drive where I put everything because I can’t stand their current document system. I want to dump my C drive onto, Hey, you know what I mean, we just we provide you decide right?

Greg Lambert 40:01

So I think I know that partner.

Pablo Arredondo 40:04

Yeah, I think everybody everybody knows. Yeah, yeah, I think transcripts, actually contracts. Right. So my whole thing I’ve been almost proud of how myopic I am about litigators. I tell transactional, you know, transactional attorneys, go find somebody else to help you I want help the litigators, this tool for the first time. transactional lawyers are like this is really helpful, because we can find clauses that, you know, mean, the same thing, even if they’re differently worded. Let’s see, one firm has been putting their intranet on it, right. So all of their internal BIOS and things like that.

Greg Lambert 40:38

That’s a good idea.

Pablo Arredondo 40:39

One very, very smart person thought about ways that you might be able to actually create customized models to send to the client, and that person knows who she is. Which I thought was a really, really amazing, it opens up a whole other world of potential use cases. Yes, just to name a few. And yeah, like you, like I said, our demos tend to be trying to kind of seed it. And then just if we did a good job, we can just shut up and let people tell us, you didn’t think of the right thing. This is what we should use it for.

Greg Lambert 41:07

Yeah, yeah, I can tell you the idea that was tossed around my place was throwing it at our time entry, and doing concept searching on time entry was, was a big winner on that. So yeah, I think, you know, it’s, it’s really interesting, because I think that having this neural net system allows the computers to be computers, instead of trying to pretend to think like a human, it allows them to do what they do best. And then the results then are, are brought back in a way that helps us so you know, it’s kind of like the the airplane, you know, when we thought we were gonna fly, we we thought we would look like birds, you know, the flappy kind of wing thing. And then, you know, eventually the the technology and dynamics changed. And I think we’re, we’re doing the same thing here. So I will tell this, if, if people haven’t looked at this or tested it out, is really cool. Pablo, I think they can they can go on and do some limited searching. Is that is that right?

Pablo Arredondo 42:15

Yeah, absolutely. Wesearch.ai I think lp.wesearch.ai if you want to be blasted with our fancy marketing page first, but listeners of this show, I mean, this podcast don’t need, you know, the, or not succuming to such things they can go, we searched.ai create an account and have at it. And just let us know if you want to talk further about it. But yes, absolutely. You can try it, we encourage you to try it. I am absolutely certain that CaseText won’t be the only company bringing, you know, various forms of this. I think without question, this is going to be how all of legal language becomes encoded and encapsulated one vector at a time. And it’s just a real privilege to get to talk to you guys about it. And thank you so much for having me

Marlene Gebauer 43:02

It’s been our pleasure.

Greg Lambert 43:04

Yeah. Thanks, Pablo. Thanks again to Pablo Arredondo for joining us today gives us quite a bit to think about. But before we go, we want to remind listeners to take the time to subscribe on Apple podcast or Spotify, or wherever you listen to your podcast and take the time to rate and review us as well. If you have some comments about today’s show, or suggestions for a future show, you can reach out to us on Twitter at @gebauerm or at @glambert. You can call The Geek in Review hotline at 713-487-7270 or you can email us at geekinreviewpodcast.com

Marlene Gebauer 43:46

And as always, the music you hear is from Jerry David DeCicca. Thanks, Jerry.

Greg Lambert 43:50

Yeah, thanks, Jerry. All right, Marlene, I will talk with you later.

Marlene Gebauer 43:54

Okay. Bye bye.