Listen on mobile platforms: Apple Podcasts | Spotify
Twitter: @gebauerm, or @glambert
Music: Jerry David DeCicca
Marlene Gebauer 0:08
Welcome to The Geek in Review. The podcast focused on innovative and creative ideas in the legal industry. I’m Marlene Gebauer.
Greg Lambert 0:14
And I’m Greg Lambert. So Marlene, yesterday I had the pleasure of going out to San Antonio for some work stuff. And I think all conferences that are in San Antonio instead of them being in July, when it’s 100 degrees should be this time in February because it was absolutely gorgeous. My wife and I was perfect. The Riverwalk had a good time. So so I’m gonna reach out to all the associations and say, you know, forget the summer time things. Let’s go to San Antonio in February. That’s that’s the best time to be there.
Marlene Gebauer 0:49
Yeah, well, I had a nice long weekend and finally got out on the bike again. So you know, did about it about 20 miles and that felt really good. And then went over to Top Golf and played some golf yesterday, so Well, we we played a top golf. I wouldn’t say that’s really golf, but it’s fun.
Greg Lambert 1:07
Yeah, yeah. Well, it’s as expensive as playing golf. In that.
Marlene Gebauer 1:11
It’s true. Without the cool clothes.
Greg Lambert 1:15
Yeah. Oh, now you can wear the cool clothes. Okay, that’s just your history. So everyone may notice that our voices may sound a little different, usually mine. That’s because we’re actually we usually record in the afternoons. And this episode we’re recording in the morning because our guest this week joins us from the Netherlands.
Marlene Gebauer 1:34
Yes. So we’d like to welcome Jan Scholtes, chief data scientist at IPRO and full professor text mining at Maastricht University in the Netherlands. Professor Scholtes Welcome to The Geek in Review.
Johannes Scholtes 1:46
Thank you. Very happy to be here.
Marlene Gebauer 1:48
Yeah. Can you give us a little bit of background on your work as a chief data scientist there at IPRO? What does that mean these days to be a chief data scientist?
Johannes Scholtes 1:58
Very good question. Well, I’m mostly interested, it’s a plying interesting new algorithms, we will talk about artificial intelligence and data science to typical ediscovery and information governance problems, especially where the bottlenecks are, and then see not only how we can use these algorithms to solve these problems, and get the bottlenecks out of the way, but also make sure that we apply to technology in a responsible way. And we make sure it’s, it’s legally defensible. It’s, of course, very, very important in the field of ediscovery. And it’s turns bearings. And if we can explain it to our users, and it’s reproducible, man, it’s stable man doesn’t do all kinds of strange. Does they have any kind of strange behavior?
Marlene Gebauer 2:52
Johannes Scholtes 2:55
Yeah, no, that’s a great example. I’m sure we’re gonna talk about that later today.
Greg Lambert 3:01
Yeah, I’ve already thought it some more questions.
Johannes Scholtes 3:03
Yeah. So so it’s really, it’s really the combination of technology and the business process of ediscovery and information, governance, and then make sure that that technology is implemented in the proper way, use it the proper way. And I kind of tried to oversee that. And so I help people select the right algorithms that make sure that whatever scientific research we do is done properly. That’s both in scientific aspects of doing its rights, but also in school aspects to do with rights. And then the other major, the major, the mansion that are merrily worried about all the time and used to legal defensibility. Because I learned the hard way in the past. If it’s not legally defensible, you should be very, very careful using technology and legal processes.
Greg Lambert 3:51
Like many of our guests, one job just isn’t enough for you. So you, you also work at the University there as the extraordinary chair in text mining, which may be one of the best titles I’ve ever seen, especially for a professor. So tell us a little bit about your your role there and what your responsibilities and focus are.
Johannes Scholtes 4:11
We have different types of chairs in the Netherlands, ordinary chairs, extraordinary chairs, normal chairs. Not if you’re doing this now for almost 15 years, one, five. When I started 15 years ago, I really wanted to do it. Because there in the Netherlands there were no people that had the skill set that we need. It’s our company that no starts the company of Zylabs, which I worked with the majority for my life and at ZyLab, we decided to sponsor a chair, who made sure that we can train them educates the students with the skill sets that we needed them to. So that Moelwyn said I started doing this one day a week. It’s Starting with a course, in text mining says find the interesting methods in texts, legal analytics, social what it’s called in the legal industry. So try to find semantic roles. Both works, but also relationships very quickly. Also started doing the information retrieval, part of the curriculum, search engine building, which is something we did ourselves since Zylab. And we were less here a little bit more time. And after I ZyLab merged with IPRO, I’d throw I was no longer operationally responsible. So now, I’m also involved in the advanced natural language processing.
Greg Lambert 5:41
Who are the typical students that that you teach? Are they are they what we would consider an undergrad? Or are they law students, and what’s the typical student
Johannes Scholtes 5:51
job learn your order, there are master students, I teach at the Department of advanced computer sciences in the Department of Engineering, and sciences. And so these are mathematicians, computer science, which is also my background. So we do a lot of programming, we do a lot of math. And then the students when they have to do internships or graduation projects, and sometimes they do applied type of projects, which could either be medical or legal or somewhere in industry, or they develop new algorithms, totally new algorithms. Some of our students, PhD students ended up at Deep minds, other students end up at the government, many of them start their own business, most of them actually already have a job by the time they get into them master. I only teach in the Master. And that companies paid for their studies stick around with these companies must lift GE rapidly is in the southern part of the Netherlands fair, close to Belgium, France, Germany, then the work of beats in Germany. So there’s there’s a lot of industry, manufacturing, and other kind of interesting companies and projects. But most of what we do is medical and legal, then, of course, the whole body of law enforcement, regulatory oversight, that’s where text mining is, is most applied.
Marlene Gebauer 7:14
How is as your research related to you mentioned information governance, so how is your research related to that? You know, what, how was that applied in that context?
Johannes Scholtes 7:24
Well, when I started many years ago, I’ve always believed that, you know, instead of waiting with eDiscovery, until the house is on fire, and we should go more office three. And we should clean up our stuff, I wrote a boast, I think it was in 2012, which was named the dark side of big data, warning people about the increase of data and all the stuff we keep and in depth, you know, when you run into an eDiscovery, that’s all used against you. So it’s a very simple principle of ediscovery is if you have a gigabytes of data, you get a bill gigabytes, if you have a terabytes, you get a terabyte bill, it doesn’t matter what the data is, if you are a lot of you shoot room, and you have data that you no longer need for knowledge management purposes, you should get rid of it. And in the false, you know, we automatically get rid of us of it because the desert mountains or officers or full Iron Mountain bills were higher, You know, we we can whenever can we apply to federal policies, and that’s all gone. Because every 18 months, the the amount of storage we have for the same amount of money doubles. So we have kind of a new low, that new version of Moore’s will deaths, we also see that our legal bill doubles every 18 months, you get more and more data. And it’s more complex, because it’s harder to find what really happened and there’s so much noise, you don’t know where to start. So for a long time, I was also on the board of at the Association of information management’s. And I’ve also done a lot with EDRM and the information governance there. And we really tried to convince people not to just, you know, be reactive, we’ll be more proactive. And that’s, that has always been an uphill battle. Because you know, why do you want to solve the problem of your successor another a lot of companies were interested in that 10 or 15 years ago, but that has completely changed these days. And everybody on the stand so if you don’t have your information governance in order, you’re going to pay for it in ediscovery. And let alone that there’s a lot of risk in this these datasets, others HR risks, there’s intellectual property risks. There’s confidentiality violation risk, so privacy rights risk GDPR risks, and maybe you don’t know how much medical data is floating around in corporates, or resumes or stuff that they should long, long while ago. Is the riddle. And so we see a major, I finally see this this tendency that people are more and more interested in, in solving the problem more upstream. And it’s the same for one now. And the good thing is with eDiscovery, you’ll have to do with rights right away to write the, you know, there’s no room for errors. But then information governance, there’s, you know, that the requirements and quality are not as high. And he, if there’s something goes wrong, you can fix it, you’re also able to use more interesting AI techniques and information governance part, then you can use the discovery discovery, you have to be very careful, every use that is not going to be used against you to be very careful that, you know, that technology has always stood by the other party, you don’t want to have an eDiscovery, you will need discovery. And then so there’s a little bit more room for flexibility and information. So I think you’d vote together and smart companies clean up the house.
Yeah, many firms now have, you know, an Information Governance Policy, but you know, yet it is a struggle, because that that age old, you know, I want to keep it because I might need it, sort of the theory is still with us. But, you know, it certainly has come a long way since you started talking about it and 2012. So,
It’s really interesting, what you’re saying now is, I talked to you, you may know, Jason Baron, you will see former director of litigation supports the White House in the National Archives. And he told me once that from all the data that they got at the National Archives there, there was probably like 1% of what these departments have, and they folded was really, really important to keep, I think they threw away another 95% of that, because they thought it was completely irrelevant. So yet we do believe that something is as more data that’s important than ever, you have to know that it really is.
Greg Lambert 11:54
Well, maybe we’ll call it Scholtes law, that data risk doubles every 10 minutes.
Johannes Scholtes 12:02
Right. God made me famous. Yeah.
Marlene Gebauer 12:06
So Jan, you’ve written about how AI can be used to improve the due diligence process for m&a deals. So, you know, there’s a lot of AI tools out there for m&a due diligence right now. And I would say that they can work well, if you invest the time and the effort in them. And that, at least in law firms, there’s sometimes a struggle with that. Because, you know, people, you know, you mentioned, we mentioned chat, a chat chat AI before. And people are kind of expecting it to work like that, like, you know, you just say something, and it does what you want. And that isn’t always the case with with at least some of the tools that are on the market. So how do you see AI tools, sort of improving the due diligence process for attorneys?
Johannes Scholtes 12:55
Yeah, yeah. Well, m&a and due diligence next to ediscovery is the most successful mark of AI, tools in the legal industry,
Marlene Gebauer 13:06
large amounts of data that you have to crunch right,
Johannes Scholtes 13:09
he has no way around AI, you know, some people call it well, this is not AI, it’s more like information retrieval, and search. I believe it’s more like a sorting process. So you sort the data up along the dimensions that you’re interested in. And then by sorting it and analyzing it and enriching it, you can quicker find answers to typical questions you have. Now if you go back to m&a. And the due diligence, there’s two types of due diligence, there’s the vendor due diligence where you want to sell your company. And then there’s the due diligence where you want to buy it. And they’re very different in nature, it vendor to diligence. corporates, and law firms have access to all the data also to the original data, sometimes emails where it originates from only memos and drafts and, and that can really help you to get an audit of what’s going on. Also, you have to data yourself physically. So you can search safe and reach safe and do all that, you know, all the great stuff, you can do it and find patterns patterns for change of control. You know, a typical list of due diligence risks is pretty limited. And I think 80 or 90% of those risks, you can very quickly find it stands for control liability, jurisdictions, different versions of different templates. The part that’s harder is when you want to buy a company, the other party is often very reluctant giving you the data electronically. So you have to work in one of these m&a environments where you can only base up and page down. You don’t have access to the texts. They don’t want you to text mine. It’s so so you’re limited in what you can do. There’s different parties who markets that have different solutions. But what we see is kind of a tendency from corporates, and also from law firms were to say, well, we just want to have a copy of the data room and then we upload it in tools. Typical similar tools like ediscovery tools, and then we’re going to look for certain patterns and certain risks. Now, you’re looking for very context sensitive, patterns, what I’ve seen is that many of the party staff use these tools, they still use keyword search or regular expressions or basic search. But if you want to have like a change of control in the context of warrant and 49%, of change of ownership, we deserve in the mountains to it. So then you have these really complex relations that you’re looking for, then these tools are pretty limited. But they get better and better. And especially with the development of deep learning, and the transformers that are also use the ChatGPT, they’re much better at finding contextual relationships. So that’s getting better and better and better. On the other hand, I strongly believe in the combination of machines and humans, especially for these legal but also for medical applications. And like I said, so you can probably find 80% of all the problems in the due diligence, by using technology, but then by judging it, whether it’s really a problem, that’s where he needs, in many cases, you need a human to determine the context. So it’s this combination of human and machine where the machine does too heavy lifting and a human decides whether something is relevant or not, whether it’s a false positive or false negative. That’s what we really works. I also believe that’s the way forward in m&a because I really don’t believe, you know, juniors with binders and yellow markers growing to data room binders. Yeah, you’re laughing, but you don’t want to know how many law firms especially in Europe, you know, are still doing that. And, you know, maybe great for your billable hours, but it is also law firms that went bust because they missed something.
Greg Lambert 16:51
With the change in structure, because you talked about how a lot of times we’re still using keywords to try and identify the sections that that are relevant. Yep. So if you have people that are well established and doing things the traditional way, how do you guide them through to say, here’s the difference on how you apply this type of learning to the project. Rather than looking for those key phrases. We talked more you look at the semantics of the saying, and so the the context of it, yeah. How do you approach people like that, to kind of get them out of the old routine and into the new? Yeah,
Johannes Scholtes 17:34
well, the best way is, is to show them that by examples that it works. So what we typically do is we have like a couple of, we have a data set when we know what’s in the data set, and then we kind of challenge them. Okay, why did you review this data sets the old fashioned way. And let’s see what you come up with how long it takes, and then we do it with technology. And then typically, we find two to three times more relevant information in a fraction of the time. And that is up to them, you know, they have to make a decision, do you want to add? Do you want to deliver a low quality service at a very high price, which at the end of the day is probably not going to be very competitive? Or are you going to offer a high quality service with which you can probably also charge a higher price. Because a lawyer and powers with technology can do more work than the lawyer without taking money, you can charge more per hour for a technology empowered lawyer than for a lawyer in the basement with a binder and a marker. It did the amount of errors is astonishing at computers may not be perfect, but they’re consistent in their errors. And in their performance. A typical computer programs are around 70-80% precision and recall and the new ones using deep learning are around 90% precision and recall. Now good humans are really good humans are to 85% average humans are, you know, maybe 70. Look, those average humans on Monday morning or Friday afternoon can be 30% equality. And you don’t know because we humans are you inconsistent in I was being inconsistent. So if you want to fix it, ours is very hard. And with technology, you there’s a great book you’ve probably read it’s from Paul, Dorothy human plus machine, and they show a lot of example where the computer they’ll slide 80% And then the human is also 80%. But if you do if they work together, you get more than 95 97% and that’s what we see in medical application radiology but also in legal applications. And, you know, humans are completely not suited to analyze the details of 20 binders, email with a marker. Computers are really good at that stuff. But humans are better in stuff that computers are not very good at that you know, computers, you know, very rational misc context in this common sense in how they can reflect and simulate their reflects, and by the culmination, you get the best solution. And the other thing is, and this is very important if you use AI in legal applications, though, the users need to understand and feel that they control the act. So it’s like a car, you know, if you brake you want to go slower, you know, and go faster. And if you hit the gas, you want to go faster. So you should give these lawyers, you should show them how the AI can be, can be steered and can control it. And one of the most successful workshops we had, we started doing that, like seven or eight years ago, we took the general data set, we let in the enrolled, there’s a little travel itinerary. So we left to use our service where the travel itineraries, and we knew there were wild, you know, 80 travel itineraries in there and a half million emails, and typically people would either find five to 5000 emails, including a lot of weekends offers from wells Vegas, or DEI, they would only find like 20, yield to relevant. And we would let them play around with Boolean search of fuzzy search and everything. And, you know, forum search, and nested Boolean, some brackets and, and then we showed him how to do its wits, Assisted Review of learning. And then within 20 minutes, they pulled up the 60 documents, and they only had to look at like 120 or 200 documents in order to find, you know, 95% recall at a minimum effort. And then and then they noticed that if they were getting like, offers for weekends in Vegas, that if they told the computer, this is not what I’m looking forward with in the next iteration, those were gone. And each they they there, there was another one that they missed a there’s a while. So one of these in these in these and they would see that these will actually pop up in one of the final training cycles. So you’re probably familiar with that yourself when you’re using this machine learning. And then you get the feeling that’s you can control things. And that as you can explain, it’s and it’s transparent. And the system behaves similar in different use cases. That’s when you can create trusts, and that’s the most important trust that people will start using it.
Greg Lambert 22:03
Yeah. And we would be remiss if we didn’t have somebody in from Europe and not at least talk a little bit about GDPR. GDPR is a constant issue, both there in Europe and for anyone that’s that’s doing business, either in Europe or with Europeans. In what ways are you seeing AI help companies remain GDPR compliant? And I’m wondering, also, on the flip side of that, are there any risks that may be involved that we’re not thinking about when it comes to using AI?
Johannes Scholtes 22:36
Well, there’s different aspects of the GDPR. First of all, part of it is that you need to have a reason why you store certain data. If you have that reason, and you explain what you’re doing, and you can explain how you’re doing it, you can do pretty much everything you want. It’s only if you have no reason you cannot builds datasets that contain personal data, or worst case, medical data of individuals. And there’s different gradations. And it’s not only the GDPR, I think it’s also the California privacy privacy laws. It’s the Massachusetts, Washington state, we see these rules popping up everywhere to be left privacy rules in China, maybe they use them for different reasons. But But all these rules help us to control data. Now the problem is that these GDPR rules are often contradicting United States ediscovery require and then the only way to solve that is by using technology. And one of the things you can do using text mining technologies to identify whether a word is an is an individual name, or company name or a location, you know, phone number, or address or whatever, bank account number, and then you can do two things, you can either redact them, but you can also proceed to the mindset. And that’s what we see a lot in eDiscovery, between Germany and France, in the United States, whenever there’s like a patent dispute or another reason for large scale ediscovery death in Europe, the data is not redacted or anonymized, but it’s psuedomized. And then for each of the names, we automatically assign you know a number and then the data is transferred to United States, according to the requirements of the Federal Rules of Civil Procedure, and whatever parties negotiates and then whenever if they find that they really need to know the name of natively tool because somebody did something or they need to have the name will def custodian then it can be can be obtained from the pseudonymisation list and that dead work I review a million documents is hard. But redacting a million documents is impossible and human beings are simply not completely suited for that that work. Well, that’s great stuff for that. That’s great work that you can do by using computers. And that’s exactly what is one of the the killer applications of text mining in the legal space. We Europe is redaction and psuedoisation of personal data.
Marlene Gebauer 25:04
So I wanted to pose a question based on a couple of things that you said earlier, you did mention that you’re a strong believer in humans working with new technology, that the technology isn’t going to replace lawyers, you know, and then you also mentioned that, you know, in order to be successful, they have to feel like they’re controlling it. And that resonated with me as well. Because, you know, I look at things like the chat AI type of technology, where it really does, you know, the problems with it aside, I mean, if you give it a control group of documents, it’ll write a base memo, it’ll, it’ll do very, you know, do sort of the basic things to sort of get you started. And while of course, you do need oversight for that. I guess there’s also the argument that you might need fewer people to do that, because their work is generated to an extent. And that I think, leads to maybe some of the the nervousness, I guess, that we’re seeing, in addition to the problems, you know, sort of the made up things, stuff like that. But how would you respond to these concerns that that increased use of AI in the legal field, you know, could lead to job loss for lawyers and other legal professionals that were the ones who were actually doing some of that base level work before?
Johannes Scholtes 26:28
Yeah, that’s a very good question. I believe that’s the industry with the most burnouts is the legal industry. So I think there’s there will always be more than enough work. And you’ll also see that there’s a whole new generation of lawyers, now, young people coming from universities that study law, because they are interested in providing strategic legal advice to their client, they didn’t go to Harvard’s, or you know, Stanford Law School, in order to be the basically the binder and a marker, just so to say, and so we see that I’ve actually been invited by law firms to help them with our recruitment. So they asked me to come to their law firm, show how demo firm was using technology and, you know, talk a little bit about AI and upgraded this, and how we control it and how we measure the quality and how you incorporated to the legal processes. And that’s what they use to show you know, recruits that okay, if you work with us, you’re about to do really interesting work. We have outsource all the boring work through technology. And I don’t know how it is in the US job markets. But here in Europe, it’s very hard for law firms to find really good lawyers. And and they need this technology, you see that that’s not only for for the recruitment, but also for at the end of the day, they will like ours there. If they are not going to jump on this technology bandwagon. The Big Four will. There’s a whole group of people ready to start competing legal firms and only reason why they don’t yet is because the ABA or the European bar associations have forbidden them, but they’re ready to go. Now there’s in the Netherlands KPMG has more lawyers than the larger world firms. And they’re ready, and they embrace technology. So if you provide a service that’s too slow, too expensive, and doesn’t have the quality it could have, at the end of the day. That’s not a competitive service. And I think that most of all firms understand but they should also understand that they they should embrace maybe different methods for billing. And like I said, the lawyer and powered by technology, you could charge more per hour than the warrior is not so powerful. And often they’re afraid to do that. Or, and you can also have subscription model or you can have a fixed price. And the really interesting thing that I noticed is that law firms that do class actions, they all use technology from the beginning to the end because they have to have efficiency. When we talk about it powers. The end of the day, you have to have a competitive service.
Greg Lambert 29:13
Jan, when you were talking about charging more for the attorney Plus technology. That’s not a novel concept for law firms, because we have also charged more for maybe some of our listeners are going to cringe when I say this, but lawyers using Westlaw cost more than then lawyers you know at least per hour cost more and because we would charge back for using basically that technology. It’s not a foreign concept to the legal industry of charging for this and it may be one of the ways that some of the firm’s I think that are knowing that they need to get into it but not sure how they can afford it. Maybe this is one of those one of those options. I don’t know if it’ll work, but it’s like I said, it’s not new.
Johannes Scholtes 30:07
Well, the way how I addressed this is staff, I typically showed him, you know, there’s a difference between revenue and profit. Revenue is vanity. Profit is sanity. It’s an old saying, and little firms really go for the revenue. But if you have, like 100 people doing illegal, reasonable and you paid and like $50 an hour, you charge $100 an hour, and you can do the math. And you see that the if you use machine learning and data scientists and AI and more expensive lawyers, at the end of the day, you make more money, and you have less risk, because there’s a huge risk that humans are cooperatively not suited for a lot of the tasks that are currently done manually and ediscovery. And let alone the information governance. And the risks are enormous. So there’s liability risks. And there’s governance risk, regulatory risk. There’s, everybody knows the examples of the law firms that went bust because of these liability risks. So it’s, so I don’t always ask one law firms. Okay, why are you afraid of technology? And is it because you don’t understand it? Because I can, I can explain it to you. If you don’t trust it, I understand that, and I could create trust for you. But if you’d only use technology, there’s two reasons actually, which are very hard to address. One is that if you have a criminal defense lawyer, and most of their clients are guilty, they are absolutely not interested in factuality, they come up with some kind of story like, okay, the bank was robbed. But my client just happened to be there. And actually, he wasn’t part of the robbing gang. And yeah, that he was arrested, you know, you know, those types of stories. Criminal Defense Lawyers, they don’t need AI most of the time, because they know AI will find the evidence against a client. So the other parts are lawyers that are more interested in billable hours, that in providing efficiency, equality, you know, if that’s your business model, you get away with it. Your customers accept it. There’s very little I can do against it. I can only show it away that’s providing more job security in the future. Yeah. That’s to embrace technology and work with together
Marlene Gebauer 32:14
I mean, the thing that we find is that, you know, we get new technologies, but we also that, that also develops new types of work, and new ways, and new and new. And so that’s been historical. So, you know, that’ll happen. That’ll happen here, too.
Johannes Scholtes 32:30
Yeah, absolutely. Absolutely. We will become prompt writers now.
Marlene Gebauer 32:37
brand new career.
Greg Lambert 32:38
Yeah. Our guest last week was talking about how he’s, he’s hiring prompt engineers. Like, really?
Marlene Gebauer 32:47
How different is it than learning Boolean search? I mean, you know, or learning or learning computer domains. It’s the same sort of thing.
Johannes Scholtes 32:53
Well, I was in the Reims in France last week, and I visited the Carnegie Library there, which was donated by Carnegie after in First World War, Reims was completely destructive thing, the First World War and, and there was a reading room with the right and to the left, there were these old antique wooden cardboard box drawers with all the index cards. And it was cool to index room. Yeah, where did the librarians go? But we have, we have different I absolutely agree with you. Yeah. But at 14 to create a trust and to defensibility, and a transparency of what, and this is also an add on if you already interviewed Maura Grossman. But this is also a more assess, you should not accept black boxes said the worst thing we can do in the industry is to provide the industry with black boxes. That’s, that will never create trust. So if a vendor and this proprietary technology and they’re secretive about it, walk away from it.
Marlene Gebauer 33:58
There’s the ethical mean black box, and the ethical obligations just don’t mix
Johannes Scholtes 34:03
well, and under GDPR, there’s Article 22, which gives every citizen the right to explanation if automatic system decides something about it. And very quickly, we’re moving towards the artificial intelligence act where that said even stronger rights. So the rights for explanation is a very important rights that citizens have so as you can explain your technology, you’re in trouble because the AI is going to be even going to have a much bigger impact even than the GDPR hassle doing business in Europe.
Greg Lambert 34:40
Well, yeah, on one of the things that’s getting a lot of press this week is Allen and Overy in the UK, I believe, has launched Harvey AI which is their supposed version of AI tool similar to ChatGPT. That is from The Press accounts is supposed to be open to some of their clients to test on their, we were joking before we jumped on that, when you go to the website there, you get a logo, a little tagline. And then a link that says, you know, click here to get on the waiting list, and not much else. Here we call that, you know, fantastic marketing, because even though that’s all you see, there must be, you know, dozens of articles talking about how innovative A&O is on this. So I know, we can’t, you know, we don’t have specifics of what they’re doing. But let’s go ahead and just talk about what we think they might be doing, or what law firms see as a opportunity here to, I guess, impress their clients and allow them to kind of get that stickiness with the relationship.
Johannes Scholtes 35:51
Yeah. And this is also I think, the way to go. Microsoft has created copilots in GitHub, which you probably know, it’s like, you know, a program that helps you to assist you to write computer codes. And they found that developers are like 40% Faster writing codes. You know, what Allen Overy did, and I have to be careful what I say because I also went to the website from Harvey AI, and is another law firm, I could not put myself on the waiting list. So I don’t know I’ve not seen the demo. I have to rely on the press release, they will send out and they don’t know what or I heard rumors. It’s GPT-4. But then nobody knows what GPT-4 is about anyway. So these are these are the trillion or 100 trillion parameters. We are though. We don’t know what it’s trainable. But I made a couple of interesting observations. And that also relates to what Bing did withChatGPT. I think these are really, really super interesting times, I’ve been in natural language processing for thirty years. And it never worked. You know, I even backtracked from NLP, in 1985, two information retrieval because at least at work that in 2017, Google added because all of this comes from Google, that’s the really interesting part and Microsoft kind of hijack this, Google came with the famous paper essentially, is all you need, introducing the transformers. And that was a radical improvement in the quality. For machine translation, we’ve all noticed I’m good spec, Google Translate became but also in understanding the search queries that are sent to the search engine. And Google service also became much better. And what happened now is that there’s a special form of these transformers, that our decoder only models, normally transformers are encoders, and decoders, the encoders kind of what the decoder does. But they figure it out. If you have a decoder model only, that also works great in certain cases. And that’s what the GPT models are a generative, pre trained transformers. But this, as they say, in MIT, they’re just parrotts, they repeat whatever they’ve seen. And only thing they can do is, is predicts whatever words coming after a couple of other words, but the greater the really great achieved for there are thre parts will that achieved with this, first of all, we can now generate language with computers that can no longer be distinguished from human beings, that’s something we never were able to do. We’re also able to stay within ethical boundaries, more or less, unless you fool with or hack it. And we found a way, reinforcement learning the same method also used to the AlphaGo experiments, the success that we can actually have the computer, talk with other humans, and then make sure it stays on track and the conversation is more engaging. Now the part, there’s two parts missing. In order to be successful, the first part is that the computer doesn’t know what it’s talking about. So he does not understand the meaning of the text. So we need to have some kind of guidance to drive that. And the other part that is missing. The models are now trained with data from the deep dark sides of the internet. So we need to do vertical training, we need to do training with all have no case will contracts. Or this from me with Allen Overy did did all of their legal data, because that’s their, that’s their power in the future. It’s all the contracts all the documents to f they can use to train these models. And if we that we call that vertical training. So if we do more vertical training, and we can control what’s being generated, it’s by understanding the meaning of text. So we move more from just, you know, statistical patterns to natural language understanding and its language inference, then these models will become much better and then the future indeed, is to create co pilots, but the CO pilots can never be trusted and 100%. So you always need to have there again, a human to validate what they do. Now one of the things that I observe both in the Bing search engine and in this part is that the Bing search engine with ChatGPT is doesn’t have any vertical moments and and it get it starts doing really really weird things after a couple of different prompts so they limited you from the also read that great story in The New York Times you’re all everybody’s falling for the Eliza effect and there’s some other great articles awesome diverse written them out have been of course Wired always the best articles and but if you look at the the demo from Microsoft, you see that the queries are very specific very details I want to have this holiday and then this and then this and then this now that’s not how normally people searched and we searched with one keyword or two keywords, we expect to get like the right answers based on one or maybe two keywords said he ever his keyword we Google was 1.2 out a reason why these queries are so very specific, is that otherwise it doesn’t work. Now you also if you look at the examples from Harvey, okay, here we go, I’m at this is one of them. And I copied it this morning, I’m at a German law firm and I go to an Indian bank about the European Union Market abuse regime suggests a skeleton in a five slide present face shape. And for each slide includes three bullet points on its content is highly detailed. And then for these models, it’s okay it’s possible to split these up the number of of prompts a use that to generate data. And but if you have to less specific and less detailed questions or problems, you will see that humans will probably understand what you’re talking about. And they will they know your culture or your background to know he was a client. And so to get much faster than generate the proper documents. And these examples are all extremely think a little bit artificial? Well, it’s a great, it’s very good. We’re doing, it’s very good. We’re doing experiments, I understood that also silicone and Quinn Emanuel are using this. If I were a little firm, I would absolutely also look at this. Because this is what you want, you won’t have a copilots for drafting contracts.
Marlene Gebauer 42:25
Yes. So speaking of the future, we’ve reached the part in the podcast Yan where we ask the crystal ball question. And so we asked you, you know, what do you see as the significant challenge or changes on the horizon for the legal industry over the next two to five years?
Johannes Scholtes 42:44
Yeah, well, I think there’s I think everybody woke up in the legal industry last two months. So that’s really helpful. The parts where we have to be very careful as an industry, you see explainability. Explainable AI, we should address that. Otherwise, we will never have trust. I also think that right now lawyer shouldn’t be too worried. But the moments there are vertically trade models, and they will be there. I’m very confident there’s now probably 50 or 100, startups in building copilots, for specific legal applications. Maybe antitrusts, may be HR, maybe contract or whatever. And once those models are ons, and that’s will be a major change in the industry. And that’s going to happen. I also think that’s what Microsoft did with integrating Bing ChatGPT is a very, very interesting experiment. But there’s one part is missing. What’s missing is that Bing is just finding web pages. And it’s it’s forwarding those web pages, unfiltered unstructured, into ChatGPT. And that doesn’t work. And we need to understand the meaning of these words, we need to understand the meaning of the relation of these words, the semantics, and not just the word sequences, and then you’ll see that these models will become much better. And all of that’s going to happen both in and out in the legal industry, also in the medical industry. Also, I would be really worried if I were a content writer, communication experts because that that industry will probably change first, if you are a telemarketing, if you are in any other creative profession, there will be a lot of changes. But then there’s also a lot of legal risks. There’s a lot of uncertainty now that we have. Copyright risks, liability risks. For me as an AI researcher. These are these are most interesting times of my life. And like I said it finally works, you know, now we have to make sure that we can create the trust that’s needed, so society will embrace what we do.
Greg Lambert 44:55
All right, well, Jan Scholtes, we want to thank you very much for taking the time to talk with us too. This has been fun. And, man, we’re right at the beginning of this.
Marlene Gebauer 45:03
That’s right. And of course, thanks to all of you, our audience for taking the time to listen to The Geek in Review podcast. If you enjoyed the show, share it with a colleague. We’d love to hear from you. So reach out to us on social media. I can be found at @gebauerm on Twitter,
Greg Lambert 45:19
And I can be reached @glambert on Twitter, Jan, what’s the best way for people to reach out to you?
Johannes Scholtes 45:25
On LinkedIn by searching for my name Jan Scholtes. Are you Johannes Scholtes and you find me. I’m more than willing to engage with you.
Marlene Gebauer 45:36
Perfect and well of course have all the connectors on page when we post but if you know you’re old school and you don’t want to do any of that you can leave us a voicemail on our kijken review Hotline at 713-487-7821 and as always, the music here is from Jerry David DeCicca Thank you Jerry.
Greg Lambert 45:53
Thank you Jerry. Alright Marlene, I’ll talk to you later.
Marlene Gebauer 45:55
Okay, bye bye.
Transcribed by https://otter.ai