Can AI Bring Both Speed and Accuracy: Josh Broyde of AI21 Labs (TGIR Ep. 261)

By Greg Lambert on July 17, 2024

This week, we are joined by ⁠Joshua Broyde⁠, PhD and Principal Solutions Architect at ⁠AI21 Labs⁠. Broyde discusses AI21 Labs’ work in developing foundation models and AI systems for enterprise use, with a focus on their latest model, ⁠Jamba-Instruct⁠.

Josh explains the concept of foundation models and how they differ from traditional AI models. He highlights AI21 Labs’ work with financial institutions on use cases like term sheet generation and financial document Q&A. The conversation explores the challenges and benefits of training models on company-specific data versus using retrieval augmented generation (RAG) techniques.

The interview delves into the development of Jamba Instruct, a hybrid model combining Mamba and Transformer architectures to achieve both speed and accuracy. Broyde discusses the model’s performance, industry reaction, and potential applications.

Safety and security considerations for AI models are addressed, with Broyde explaining AI21 Labs’ approach to implementing guardrails and secure deployment options for regulated industries. The discussion also covers the balance between model quality and cost, and the trend towards matching specific models to appropriate tasks.

Josh also shares his thoughts on future developments in the field, including the potential for agent-based approaches and increased focus on cost optimization in AI workflows.

Listen on mobile platforms: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Apple Podcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Spotify⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠⁠⁠⁠YouTube⁠⁠⁠⁠⁠⁠⁠⁠

Contact Us:

Twitter: ⁠⁠⁠⁠⁠@gebauerm⁠⁠⁠⁠⁠, or ⁠⁠⁠⁠⁠@glambert⁠⁠⁠⁠⁠
Email: geekinreviewpodcast@gmail.com
Music: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Jerry David DeCicca⁠⁠⁠⁠⁠⁠⁠⁠⁠

Transcript

Marlene Gebauer 0:07
Welcome to the Geek in Review. The podcast focused on innovative and creative ideas in the legal industry. I’m Marlene Gebauer

Greg Lambert 0:13
And, I’m Greg Lambert, and this week we’re excited, because we’re talking with what we’re

Marlene Gebauer 0:18
always excited really. We’re particularly excited

Greg Lambert 0:21
Exactly. So we’re talking this week with Joshua Broyde, who is a PhD and principal Solutions Architect at AI21 Labs. So Josh, we’re really excited to have you on the geek and review I want before we jump in with the with the questions that we have. AI21 Labs is a company that builds foundation models and AI systems at enterprise scale. So do you mind just giving us kind of a high level about what that means and why your customers seek you out for their innovation needs?

Josh Broyde 0:54
Yeah, sure. It’s, first of all, it’s great to be here. Thanks for having me. Customers seek us out for a number of reasons. The major one is there’s a small group of companies that are actually building foundation models from the ground up. We are one of those companies. Our latest and greatest model is called Jamba-Instruct, which has a number of advantages that we can describe, we can explore a little bit later. But I would say that we’ve seen a lot of interest from a different family of models also that we provide. These are called task specific models that are good at doing specific things, like answering questions, summarizing text. There’s been a lot of interest in this type of model within the industry, more broadly, and I would say, where we’re taking this in an interesting direction, but also in general, customers are very interested in high performance but low cost models. And that’s about where we are moving. And we are seeing interest.

Greg Lambert 1:50
It sounds like the legal industry we want and low cost just, just for clarification, do you mind explaining what a foundation model is?

Josh Broyde 2:03
Sure so a foundation model can be defined as really any model that can be used for multiple tasks. I would say that’s the broadest definition. The most interest we’ve seen in this today is large language models that can do lots of modeling around text. This is, though, in very much in contrast with the older technology, and when I say old, I mean five years ago, of a model that, let’s say just does regression analysis like predicting an output variable over a table that has just so many columns and maybe just so many rows. So the idea of a foundation model is it can do multiple things based on flexible types of input. There’s a lot of interest nowadays in large language models, but of course, there are foundation models for analyzing images or video or other modalities as well.

Greg Lambert 2:57
Thanks for clarifying that.

Marlene Gebauer 2:58
So Josh, I want to talk about a couple of the use cases that you have on your website. They might not be legal industry specific, but they’re very legal industry adjacent. You know, I’ve had some exposure to sort of term sheet generators, and so you have that as a potential use case. You also have the financial document Q and A, and these both look really interesting. So can you talk about these and some of the other use cases maybe that you work on with your customers? Sure.

Josh Broyde 3:26
So to describe those a little bit more in depth, there’s a lot of interest in finance. I know you’re a little bit more focused on industry, and I think there’s a lot of parallel interests, but in the context of term sheet generation, the idea behind that use case, which we actually worked with a large bank on, is given a corpus of data generate a term sheet. A term sheet is just simply a non binding loan agreement that is very heavily grounded on the original data set. This is essentially meant to be a time saver, because rather than having a person getting all the critical criteria from the original corpus, now that can be done using foundation models, so trivial things like the loan amount, the collateral, other key terms that are needed for generating a term sheet are there. Importantly, though, actually, when we built this ideally also meant to capture what’s missing. So this actually uses an architecture that combines our foundation models, but also pass specific models for answering questions, just to pick a particular example, if collateral is missing, that might be a red flag for a term sheet. And I would say that type of reasoning is something that we see as very important for our customers. Similarly, the financial document Q and A, which, by the way, what I have seen is a cross industry problem. Everyone is interested, given my documents answer questions, similarly, has to be very heavily grounded on the data. For example, if I upload a 10k for around a company at a particular year and I ask a question like, What was revenue that has to be very heavily grounded on the document in front of it? Like to take this into the legal domain, if people are interested in things like uploading decisions by judges and answering questions, a similar problem has to be faced, and models have to be very heavily grounded on the data in front of it.

Marlene Gebauer 5:28
I had a follow up question, so like for a term sheet generator, how would the foundational model work differently than, say, an expert model that has like, you know, conditional logic that how would that be different?

Josh Broyde 5:40
So an expert model that, and I would say, frequently used to have tree based logic, is x true? If x is true, is y true? Can be very powerful, but it has to be extremely tailored to the system at hand. So for example, I mentioned before, is there collateral present? An expert model could be essentially tuned to say, Detective collateral is mentioned. You probably, by the way, have to have a model specifically geared towards, is there collateral mention in this document? And if so, pull out, you know, flag it that this is a problem that is a very specific system. And when you ask, well, now I want to move beyond collateral. Now I want to ask is some other criteria, like closing conditions present, and now I got to train a whole new model just for my new criteria, with a foundation model. And what’s taken the industry by storm is that, because the model is foundational, it has a lot of knowledge already, and to some extent, certain kinds of reasoning I can now just plug in, saying, is collateral missing? Are closing conditions missing? And I essentially have the same model solving different problems. The expert systems can be still powerful, but they’re hard to scale because I now have to build up every little component on its own.

Greg Lambert 7:03
I have one question. You talked about working with a bank on its corpus of documents in order to build this one of the things that I think we’ve heard over and over again in the legal industry is, well, here’s the situation I tend to get. Boy, it’d be really great to take all of our documents and dump it and just create our own large language model, which we’re immediately saying, No, that’s not how it works, and you don’t have enough. You know, when you look at a corpus of documents, kind of what’s your threshold that you look at to make it actually worthwhile?

Josh Broyde 7:37
It’s a fair question. So there are two daunting factors of using this one is the amount of data needed that can be fairly large in large language models, speak, we use a term called tokens. A token is maybe half a word or two thirds of a word to build a brand new model from scratch is extremely expensive and will require many billions or even trillions of tokens. Most customers who are interested in this are not building from scratch. They’re interested in taking a previous model and essentially adding their documents to that model. That’s called, referred to as continued pre training, where I take an existing model and by the way, we’re working with customers on this actually, and then we essentially incorporate the data in that model that also can be very expensive. You’re talking many, maybe hundreds of 1000s of dollars, and you’re talking about many billions of words, so you need a lot of data to do that. Now to turn to smaller tasks, like, let’s say someone says, I want to classify a document. I don’t want to do fancy analysis. I want to just classify something that might take, you know, few 1000 examples. So that’s fairly cheap. But when someone says that we need to train a model on our data that can be many hundreds of 1000s of dollars depending on the size of the data,

Marlene Gebauer 8:58
I don’t know. I mean, if you can answer this, but, but more be more generally. But like, what composes your client base? Is it? Is it more in house, you know, corporations, or is it it more firms, or sort of, what’s the breakup on that?

Josh Broyde 9:13
Yeah, I think I can approach that generally. We are seeing a lot of interest from the financial industry and from I would say, also the healthcare industry, there’s quite a bit of interest on using foundation models, but also with the security concerns, I would say a common pattern is regulated industries come with their own sets of problems. We’ve seen a lot of interest in banks on using who recognize, for example, the power of foundation models, but also are scared of the risks, and they’re interested in in using our models,

Greg Lambert 9:45
yeah, and but I imagine that there’s probably a benefit from that, because with these highly regulated industries, whether it’s banking or finance or health care or or others that are, that are typically, you know, they. Typically have to make sure that the data they possess is, you know, they have their arms wrapped around that that information, that they understand what they have, and in a way, I imagine, that that benefits with what you’re doing and trying to, you know, leverage that information and create these, whether it’s chatbots or anything else that you may create in order to have really good data to use on is that, are you finding that?

Josh Broyde 10:29
Absolutely, I would say that most of the time, though, customers have to preprocess their data, which is itself a challenge, ensuring that the data is high quality, is diverse, is itself a challenge, and not necessarily a trivial challenge. When training a new model, that’s that’s a key thing that we’ve seen, I would say I’m going to connect a little bit back to law, if that’s okay. A few years ago, actually, when I was moving from academia into industry, I actually built a little app that was trying to predict, given the text of a motion, would the Supreme Court grant cert on this motion that was filed to the Supreme Court? So, you know, it was using basic models at the time, you know, it kind of worked, I would say, not something that a law firm would want to buy. But nowadays, this type of analysis is something that I’m seeing in the industry generally, or I think there’s a lot of potential within law for doing this type of analysis. Greg, going back to your question, a key challenge, though, is quality of the data, but also, how do models interact with diverse data types? So for example, a broad question might be, what cases are precedent for XYZ that requires searching, but also a certain amount of reasoning where there’s an expectation that a paralegal, for example, or a lawyer, will think about the responses and incorporate them, I would say this is, this is a key issue that a lot of industries are facing. And when building models, they want to make sure their data is incorporated, an alternative that we tend to encourage. And I would say the industry is strongly going in is not building a model on your own, but what’s referred to in the industry as a rag retrieval, augmented generation. Total aside, I never loved the term rag, although it’s widely adapt adopted. It sounds like something not good, terrible it does, right, but I didn’t invent the term. But rag essentially means that I use a model out of the box. But when I ask questions or do whatever I’m doing, I inject my data in it. So for example, if I ask a legal question rather than posing to the model, what are legal precedents for the following, in which case the model is using its own reasoning, its own trained data, to answer I can instead say, given the following data, what are precedents for a particular legal ruling. Please only ground yourself in this data. Don’t move outside of this data. It’s an open field that’s constantly evolving, of how do I beg, plead, cajole, threaten the model to make sure it’s heavily grounded. But that’s the idea of rag it generally speaking, is cheaper than training your own model, at least for smaller use cases.

Greg Lambert 13:24
Yeah, and we see that a lot with the primary legal research tools that we have in the industry. Lexus, Westlaw and others are heavily relying upon the the retrieval augmented generation technologies to ground that into content, into their content. I think it’s still one of those areas that that we’re still struggling with, because it’s not perfect, and sometimes the marketing speak can get a get ahead of the reality of it. And I think we’ve seen that with the we had a Stanford paper that was published that had some issues with with the way some of the major legal research platforms described how rag was grounding into that. So why don’t we expand that just a little bit while we’re while we’re talking about it? How do you use something like rag, or even, I know there’s a term called agentic rag, which may be a completely different topic, but you know, how do you use it, and what are kind of the good things about rag, but what are some of the shortcomings that people should think about as well?

Josh Broyde 14:40
Yeah, so I would say, as an industry, we’re still in the infancy of excellent RAG. The current approach, and a lot of our customers are doing this, is essentially you, you take your raw data, you put it in a vector database. All of vector databases, I use a model that creates embeddings and embedding. Is just the numerical representation of my data, and then I have this set of numbers per document in my vector database. When I have a question, I have a process for also creating this numerical representation, the embedding of the data, and then I look for similar data that I already have in the vector database, get the resulting documents and then go to town with my large language model to figure out what I need to figure out. That’s fairly simple. It works in simple cases. But there are a number of challenges that I would say we’re thinking about. The industry is broadly thinking about. A key challenge there is that I’ve created, like essentially a static database. It has to be updated, and importantly, I have to ingest my data in there, in my vector database. And it might not be convenient. Just to give a trivial example, some customers use what’s called a graph database. A graph database has essentially entities that are connected to other entities or nodes via edges, the way a network would work, how to do rag with a graph database is a little bit of an unsolved problem. Similar, in my opinion, similarly, if I just have raw tabular data in a SQL database, I need to generate queries to get that data first. So the key issue here is that, just saying stuff everything in in my vector database, or your vector database of choice doesn’t always solve the problem, adding another layer of complexity. Some customers have data that’s locked in a proprietary application, and they don’t want to necessarily copy all that data out, or it might be very difficult to so what you alluded to, agentic rag actually something we’re investing quite a bit in for the future, and we’re thinking very heavily about, which is how to have a retrieval, augmented generation model that can easily and seamlessly go to multiple sources, figure out what needs to be retrieved and then go retrieve it. I personally see this as one of the next frontiers of doing this very, very well, rather than just having fairly static vector databases and rag applications.

Marlene Gebauer 17:10
Yeah, you’re in good company. Andrew Ng says the same thing.

Josh Broyde 17:15
He took it from me.

Greg Lambert 17:18
Back in March of this year. You launched Jamba-Instruct, Jamba being this hybrid model that blends together what’s known as Mamba and then transformers. I think a lot of people understand the Transformers part, but may not understand the mamba part. So can you talk to us about what the significance was in creating something like Jamba, the Jamba instruct model that you’ve done there at AI21 Labs. And the motivation behind creating this absolutely

Josh Broyde 17:47
so the motivation essentially came from when the mamba architecture was first published. It was noted that one of the advantages was it was extremely fast. It’s essentially, especially for larger input, it’s linear. So that means that my my input gets bigger and bigger, and my increase in time of how long it takes to process, it obviously also increases, but not nearly by as much as transformers. So the key advantage of it is its speed, which is something that got a lot of attention because transformers, just to back up a little bit, while they’ve achieved excellent performance in the industry, are essentially quadratic. So as I double my input for training or for inference, I have to now quadruple my compute power or time. So they’re not really well suited for for extremely large inputs if you wanted to it cheaply. Mamba represents one direction for how to do this cheaper and faster. The problem, of course, is that they were not achieving the performance in terms of accuracy that Transformers were. Jamba is a blend. Jamba actually stands for joint attention Mamba, where we essentially try our best to capture the best of both worlds, the performance of transformers on one hand, but the speed of Mamba on the other hand. And in fact, in benchmarks, we’ve seen that Jamba performs very well or better than similar sized models, while also being significantly faster because it’s using this Mamba architecture, one of the reasons we open source the base model for Jamba is to get the industry a little bit thinking more deeply about how can we build models that take the next step beyond just transformers.

Marlene Gebauer 19:40
So I mean, how does Jamba instruct perform compared to the more traditional models?

Greg Lambert 19:47
Well, I just want, while you’re answering that, also want to see what’s what’s been the industry reaction? Have you had a lot of folks starting to contribute?

Josh Broyde 19:58
We’ve seen quite a bit of interest. Based, both in the base model, as well as also so people contributing to the base model, and also, we’ve seen interest from customers who have tried transformer based models and would like to see optimized performance. And since we’re known for using the mamba based model, we’ve seen interest in that. We’ve also seen there are now other model providers who are now starting to experiment more with Mamba models as well. So I think that that goal of having people think about this architecture a little bit more, that broad movement, is somewhat achieved.

Greg Lambert 20:32
Yeah, just to kind of back up just a little bit, because I actually had a meeting this morning where we were, we were talking about, well, I can, I can input a query, and then I might as well go and fix myself a cup of coffee, chat with somebody in the in the break room, and then come back, and then by the time I’m back, I might have the answer and and so you’re saying with these, with with Jamba, you kind of get this blend of having some of The benefits of the transformers and the and the quality of the work, but also speeding that up. I’m also just curious, energy wise, is it? Does it reduce the amount of energy that it takes to do it as well? Yes,

Josh Broyde 21:13
so energy generally, like, correlates very highly with cost and efficiency, because every second that you’re not spending on the query is energy saved or energy you can use for some other query. So the short answer is yes.

Marlene Gebauer 21:27
You’ve mentioned like, safety is a consideration for clients, and there are safety features that are built into Java and struct. So how do you approach applying those types of guardrails and also allowing for the performance.

Josh Broyde 21:44
So I’m going to speak a little bit for myself, because it’s not just about the model, it’s about a whole architecture. When we talk about safety, safety is somewhat built into the model, where models, let’s say, are instruction tuned or aligned not to respond to queries like how to build a bomb or something like that, that comes from the model itself. But my general recommendation to customers is twofold. The first is you have to decide what safety means to you and very clearly articulate those criteria and then implement them. So I’ll give an example. Giving investment advice is not an inherently unsafe activity, but you might not want to do it in a chat bot that faces the whole world if it’s outside of the scope of the job. So one of the key things is, when you have a criteria that you want your large language model to avoid, you have to outline the criteria and think very carefully. I encourage this for all customers to think about this. Just to pick another example, even though most companies that are using large language models, at least for externally facing applications, for example, customers don’t want to engage in discussion around their competitors. They want essentially all mention of competitors as essentially treated as not safe. That has to be done by either fine tuning the model. Or my recommendation is you have an approach using like large language models, meaning when the output is created by the original large language model, you have a separate prompt that looks for your criteria, and then you check whether the output is safe before you pass it to the user. That’s generally speaking, what’s going on. If you use any tools that you know might generate the response and then go back and delete it, that’s generally what’s happening. You’re using a separate, large language model that’s specifically geared towards safety to look for any problems. I’ll add one more thing about safety, which I very tightly think of in conjunction with hallucinations. Hallucinations are essentially when a large language model makes something up. It can make something up that is completely false or unsafe, but I would say carefully building citations, so you have a separate application, which can be a large language model that when the output comes in, it essentially tells you where it’s coming from in the original data. That’s a key thing that most customers are thinking about.

Greg Lambert 24:14
Let me spin it a little bit off of safety specifically and talk more about security, because you’re dealing with highly regulated industries who may have very confidential or information that they can’t let out into the wild. How do you work with these industries to make their head of security comfortable in using a product like this,

Josh Broyde 24:43
There’s a couple of ways that we do this, and yes, absolutely, especially in regulated industries, customers are thinking about this. We support different modes of deployment and using our models to allow this the most secure fashion. And this tends to be, for example, what banks are interested in is. Is we can deploy our model directly in your virtual private cloud, so the the model is deployed in your compute environments. When you use it, no one else sees it. It’s just your model and your compute. We can work with you on how to fine tune the model, potentially or optimize the model, but once it’s in your environment, it’s essentially completely within your control. We also have our models deployed on a number of the all the big clouds now, where customers can essentially use those models in a serverless manner, and essentially they can pass it securely, get past their prompt in, get the completion, and then move on in their application to pick one example on Amazon Web Services, our newest model, Jamba instruct, is within Amazon bedrock. Bedrock is Amazon’s flagship generative AI service. Customers who use bedrock pass the data to it, get the answer back. We don’t even see the prompt or the completion. So that’s also very secure as well.

Marlene Gebauer 26:07
I’m wondering, Josh, sort of where clients are falling on, sort of quality versus cost.

Josh Broyde 26:15
It’s a it’s a great question, and one that maybe I don’t have a great answer to I’ll tell you my take, though, some customers are still and you’re seeing this in the industry on this like exploratory phase, where they’re not thinking about cost, but I think that that was mostly a last year thing, and now customers are starting to worry about the bill for tasks that require a lot of deep thought and are expected to be very expensive to solve by a person. I think potentially customers are okay with a model being more expensive, because it’s recognized that essentially, this can aid a person who is very expensive, and you have to balance the cost of the model versus having this purely done by human review, where I think customers are getting more and more sensitive, is this notion that for every query, we’re just going to use the biggest, most expensive model, this hugely increased costs, and this hugely increases latency. So we are seeing customers, and I would say the industry, more broadly, thinking about matching queries and problems to the right model. For example, there’s been some work recently around routers of given a input, figuring out which model is best to solve this. So I would say we’re moving a little bit away from this era of, you know, you’re going to bring a cannon to a knife fight to, you know, bring a buoy knife to a knife fight. That’s a key interest that we’ve seen from our task specific models like summarize and contextual answers, where customers are interested in them because they do a specific task very well, and in most cases, potentially significantly cheaper. So a lot of customers are interested in this because, just to summarize what I said, there’s a understanding that in almost every workload, like summarizing this is something that smaller, cheaper model, can do much faster than the biggest model that you can think of.

Marlene Gebauer 28:15
Is there any type of guidance out there on that, in terms of, like, this type for this type of of task, this type for this type of task, things like that.

Josh Broyde 28:26
I would say it’s a little bit of a moving target. But to me, the clearest example is summarization of text, where people have been thinking about this for a long time. There are even models that are specialized just for summarization. To me, the big ones that I think of are anything that involves summarization or question answering. You should be thinking about a smaller model or other types of natural language processing, like entity extraction or other like if you were doing, let’s say grammar parsing, which maybe is not suited to a large language model, but I think that it generally would correlate. This is my experience with intelligence. If you ask yourself, Is this a task that someone who only is very highly educated, very much an expert in this field, can do? You’re probably leaning towards a more complex model. If you’re thinking about a task that someone can do, you know, with little training, or like a teenager could do, that may be something that fits more a smaller model. I’ll add one more component, though, which adds an extra dimension, which it also depends on the risk to the customer, a question that has to be asked is given that all AI is probabilistic, what’s my cost for getting it wrong? So the same question that in two cases, if I absolutely need to get it right, then I’m going to be more likely to use a bigger, more powerful model. If I’m okay with smaller errors, I can go for a smaller model. If. Right? And I’ll just add one more component, even though I just said one more last time, which is that customers are thinking about chains. This is where the industry is going. This is how we see customers use all of our models, which is, you don’t just have one output, you have an output that then is checked against another large language model, or another instantiation of the large language model. All of this can be done to essentially ensure accuracy and correctness at the cost, of course, of latency money.

Marlene Gebauer 30:29
Yeah, well, important for the legal industry, for sure.

Greg Lambert 30:32
I’m just curious. You’ve now had four or five months since Jamba-Instruct has come out. Has there been any surprises, any results that kind of caught you off guard, that hopefully a positive way.

Josh Broyde 30:49
That’s an interesting question. Yeah. So I would say that we, you know, we’ve seen a lot of interest in the field. I’m continuously surprised at how fast everything is moving. So, you know, no one large language model can say, like, we release this and, you know, we’ll be cool for the next two years or even two months. I would say that I’m continuously surprised by how much everything is coming out. What I have seen, especially compared to our older models, is how well and how flexible Jamba instruct is even compared to task specific models, so we have to benchmark, for example, Jamba, now I would say does pretty well answering questions. It raises an interesting question, and I think the jury is still out in the industry of will horizontal meaning like task based models or vertical models per industry. How will that jive with large language models? Are we going to be building large language models for industries, for tasks? I think that this is a, you know, it’s less of a surprise and more about a question that I’ve been thinking about since Jamba instruct came out. My own feeling is that Jamba instruct will be very useful for low latency responses that do a number of tasks very well, like summarizing, answering questions, measuring compliance irregularities and other things like that.

Greg Lambert 32:12
Well, Josh, that brings us to our crystal ball question. So where do you see products like Jamba instruct and AI21 Labs expanding in the near future, and then what kind of challenges do you see over the next couple of years that you’ll probably need to address?

Josh Broyde 32:30
Yeah, this is a great question. I’ll tell you one use case that I’ve thought about a lot over the past year, which I just mentioned. It’s compliance. We have a number of customers, and I’ve seen in previous roles that boil down to we have document a, which, you know, let’s say, might be a pharmaceutical manufacturing protocol, and Document B, which might be a regulatory document for how that product is to be manufactured. Please tell me if the first document is compliant with the second document. This is a hard problem. I actually, in some basic testing, have seen Jamba instruct do very well for a number of reasons. First of all, these can be very large documents. You might, you can. These can be dozens or scores or even hundreds of pages long. Getting the answer correct is tough. I anticipate models like Jamba instruct being very useful for this, because they can ingest quickly large contexts. I would say, though that where the industry is going is making this better. As of today, most large language models are tested on what are called needle in the haystack. Tests where it has to look for some basic fact or answer to a question in the text. But when I tell you I have document a and Document B, each is 100 pages long. Find all the compliance problems. You better find, you know, well and good, all the compliance problems. You can’t only find five of them, or, 10 of them, if there are 97 of them, this is something that’s very hard for people to do. That’s one major use case that I have seen, but I see actually going forward much more around agent based approaches, where agents are able to do complex tasks, like calling large language models, figuring out where to go get data, and then also low level execution of that. I would say that, you know, this puts me in the camp that I haven’t made up of fairly strong agent based future that I think could use models like Jamba instruct. I’ll just add one more crystal ball, which is, I think that there’s going to be more cost optimization, where you provide a workflow, even a complex, dynamic workflow, and you give a battery of different models, task specific models, industry specific models, general purpose, large language models, and you essentially measure not just the. Accuracy of mixing and matching, but also, what is the cost of mixing and mass matching different models for different components? How do we optimize that cost? It’s not necessarily a trivial optimization problem. I think that that’s one component where industry will go.

Greg Lambert 35:16
Very interesting. Yes, well, Josh Broyde, PhD, Principal, Solutions Architect, their AI21 Labs, I want to thank you for taking the time to talk with us today. I enjoyed it.

Josh Broyde 35:29
Yeah. Me too great to meet you all. Thanks for having me,

Marlene Gebauer 35:32
Josh, where can our listeners find out more about AI21 Labs or reach out to you for more information?

Josh Broyde 35:38
So they we are on Twitter. I personally am on Twitter, although I mostly just follow. I like that. He says, Sorry, I guess, although I don’t know what to call them if they’re not tweets, what are they exes? I don’t know. So you know, we’re also posting regularly on LinkedIn. I would say those are good places to reach out. I can be reached either directly on LinkedIn or via email. I guess the older way,

Marlene Gebauer 36:07
We’ll have connections in the show notes. And so thank you, Josh. And of course, thanks to all of you, our listeners for taking the time to listen to the geek in review podcast. If you enjoy the show, share it with a colleague. We’d love to hear from you, so reach out to us on LinkedIn. And as always, the music you hear is from Jerry David DeCicca, thank you, Jerry.

Greg Lambert 36:28
Thanks, Jerry. All right. Talk to you later. Marlene.

Marlene Gebauer 36:31
Okay, bye, bye.