Will LLMs scale and revolutionize the legal space? — Transcript

Thanks for having me. Where are you at the moment? I’m in Berlin. You’re in Berlin.

Borb is actually one of the upcoming tech stars in Berlin. And I found out your birth. How many years do we know each other now? Three years.

Two years. Two, three years. Yeah. He has guided the fate of LegalOS.

And we should chat in this. And he has always impressed me with his technical savvy, like technical skills and how to drive forward toward the next new thing and how to use technology. So we should chat about this. And there’s a second gentleman here who is called Jasper.

And you might have seen him already. Was he already in our class? He was in the afternoon class. So he was in the morning class.

So Jasper is a known entity. It’s awesome. Jasper, good seeing you. Any new Lego project you finished?

No, actually, my son starts dismantling my Lego. So it’s a bit of a painful process right now. That’s very nice. With that one, I would leave it quickly over to you.

Tell us a little bit about yourself. And LegalOS. Yeah, let’s start with LegalOS. We are building the infrastructure for law.

And it’s been an interesting journey. The company has been around for four or five years. And I joined roughly three and a half years ago. We heavily build on sort of structured data in order to represent legal knowledge.

We build pretty much a tool for legal experts to encode their knowledge. So that it’s actually machine readable, machine executable. So it’s sort of its own programming language, but it’s a no-code tool. It’s like a visual diagramming tool that happens to also produce a representation that you can use to reason about the law, to ingest, to produce documents based on user input.

And in the past several months, we’ve been preparing kind of a pivot. We’re still dealing with legal knowledge. But. We’re now using LLMs much, much more than before in order to make that legal knowledge accessible.

That’s an interesting journey for us. May I actually like double down because I know a little bit about your business case. So LegalOS tries to build a structure in no code language for the legal system. Why is this a thing?

Because the way lawyers talk is actually no different the way programmers talk. They describe. The situation. They describe.

If then clauses that describe what to do in certain situations and they codify that in what we commonly know as human language. But if anybody of you had read the legal document, the difference between a Python document and a legal document is not that huge. Both don’t seem to make sense if you don’t understand the language, right? If you don’t speak Python, you fall asleep by reading a Python code.

If you don’t speak legal text, you fall asleep. I was reading a legal document. So the effect is the same. It is.

There is some codified language. So they are translating legal speak into the more automated approach. Have we seen this before? Like in the industry, have we seen times where people try to make no code out of language or make a graphical interface for computer language?

Well, if you look back, I mean, computer language or computer languages like Python or whatever, what have you is also such an approach. It’s a way to make the computer understand what we want it to do and reason about things that drive us mad. We want to automate processes that we don’t like to do. Yes.

And I mean, that’s essentially the whole. You’re absolutely right. The whole. The whole history of computer language is this was complex and we tried to make it easier and easier and easier.

Graphical interfaces like Python is a very easy language to learn. And it’s way easier than assembly code. Why? Because we tried to make it easier.

Can we do the same thing with legal text or legal speak? Legal OS says yes. So they started to develop this. Now LLMs come in.

And why is this such a big thing? What’s the reason for you guys? Well, it certainly makes this knowledge much more accessible to us. First of all, legal knowledge is not always generic.

It’s not always the law that you can just encode and have it available. A lot of legal processes and legal information, for example, in companies or in case law and so on, are highly specific to the actual case so that you still need someone to encode that knowledge. And that is only true for a part, I would say, of the legal domain. That you have enough repeatable cases that it’s worth automating that.

Now, legal requests don’t always fall into the same boxes. And if you work in a fast moving company, new legal questions arise with every new customer. And the customer that you’re on board has a specific set of rules in place for their compliance department. And you want to comply with that.

You convince them. And your legal department won’t map every single rule there is in place so that you can automate it. And now LLMs come into the play. And LLMs allow this level of abstraction where you can see the generic rules that have been put in place and sort of automatically reason or give advice about them.

Let’s say. Without anyone knowing. Without anyone. It’s sort of a higher abstraction, a higher level of abstraction for the automations.

If you use it right. If you use it in the wrong way, I would say it hallucinates something crazy. And we should talk about how you use it. But let me abstract from this one.

So essentially what you’re saying is, cool, we have this legal text which is following a certain structure. Let me put as an interface an LLM on top of it. To actually. To actually interact with this legal text.

So that I, human, do not have to understand legal text or legalese. But actually can use it. So the LLM is actually an interface, an API, a translator. And we will later on in class actually discuss about what transformers do and why transformers are such an amazing thing to be a translator.

And we will talk about the technology behind it today. But. For now, let’s, let’s park this. This is an interface.

Jesper, you are the venture capitalist and you see many different startups now coming around. Like, can you map this interface or being a good interface LLM discussion? Is this something which is special only for legal text? But we see this more.

Well, we see it more definitely because it just eases as Tom described. And I would love to, to elaborate a little bit on that. It eases the X. It eases the X.

It gives me access to those kinds of structured automations, what legal OS is doing. Because language is a different form of, you know, expressing what I want. And I can use the LLM to translate that. Let’s call it into machine code or machine learning code.

Because the graphical interfaces you mentioned, I mean, and you said it both, it’s abstraction layers. So it gives easier access, but it gives easier access to repetitive tasks or to things that are more common. And that’s why participants suggested that participants suggested that participants suggested that participants suggested that participants from a sheet of paper to a spreadsheet, for example, with quite stupid tasks, but very controllable, very clear. So then you use the script and the script does exactly the same, but in a very controlled form.

Now, when it comes to legal, and this is the segue to what Tom said, the great idea, and we were investors with my previous fund in LegalOS, the great idea was to have a very simple way of coding language, legal language, or basically legal rules. Because when you study law, you learn a lot of rules. I mean, you guys learn the case law, so that’s a bit more, let’s say it’s easier to read. In Germany, we have very written down stuff.

So you have stories, we have more the rules written down, but at the end, it’s rules, right? So LegalOS had a tool to codify those rules, and then a system can follow those rules in a very, very secure, auditable way, right? So you had control over it. Because as a VC, we have typical contracts, we invest in companies, we buy and sell shares.

Why do you have a new contract every time? Why is it done by a lawyer, although it’s always the same? So that was the core idea of LegalOS. Now, the tough part then is to create those rules and map those rules to the situation of the company, because somebody would express, we want that the founder is not able to sell the legal language.

We want that the founder is not able to sell the legal language. We want that the shares in the next three years lock up, right? So how do you translate that into this kind of no code? And that’s my connection here.

It was a bit longer. And that’s why this is suddenly unlocking a new interface way, because before you would go to a legal engineer at LegalOS and say, please put that into code, and now they can link the model to it, and or say, hey, we have a contract already. Can you automate that with your nice low code or code approach? And say, yes, because we can link words to code.

And that’s the amazing new thing that is happening and unlocking so much value because you don’t need the experts anymore. And this is very well described. In class, we discussed about the three different waves of AI. And the first wave was, yes, people set up the rule and asked the computer to operate on that rule.

And everybody got excited. But that was 1956. So now we have legal text. And we try to make rules in an interface for that legal text.

It’s a finite space. Legal speak is a finite space. But it’s still very complex to make all those rules to create a language for legal. And we humans have to make the rules.

Like in the 1930s, people made some rules for a checkers game. Or something like that. What’s the name? Some rules.

Some rules. Authors. Whatever. Boys first checkers game.

I should have given more. Now, Thorben, the amazing part. What is LLM now offering you so that you do not have to create those human rules? And how do you use that?

Well, it’s basically the human. Or most languages offer a variety of ways to express the same thing. And there’s usually like an infinite way. And there’s a lot of ways to give commands to a system.

And understanding those, all of them, before LLMs was a real pain. So I successfully quit my PhD in 2015 and co-founded a natural language processing company. Just to give you an idea of how the development was. Back then, the state of the art in natural language processing was counting words, basically.

Like a glorified way of counting words. But it was like, you know, you have to give IDF and Latin semantic analysis and so on. You sort of kind of had an idea of what the user might be talking about. But you couldn’t, with a degree of certainty, map it to something.

And back when we started, word embeddings just became a thing. So you suddenly had, for individual words at least, you had an idea of what those words mean and if they are close to what you expect from the user. But it was usually, you had an expectation anyway for what the user might be expressing and you were validating it. But like, really open worlds, mapping it to sort of, mapping this really open world to a close set of tasks that you can accomplish as a system was incredibly hard.

And LLMs changed that and make that way, more accessible. Because you don’t need to set up your own world. You don’t need to set up your own world. You don’t need to set up your own model.

You use one of those foundational models out of the box pretty much and they can solve a lot of your tasks already, even without fine tuning. Back then, you had to collect your own data. You had to train your own specialized models in order to do one stupid thing. The whole idea of being very narrow AI, absolutely.

I remember everybody was super excited to have a word cloud where the size of the word was… correlated with the number of times the word appeared. And they were so useless, right? Like, what is it?

It’s a colorful thing, but I still don’t know. It’s something about X and we don’t really have it now. The whole understanding of text comes now with LLMs. Meaning, we said legalese is hard to learn.

And now we’re using an LLM to speak legalese for us. There was this, I think it was Bengalis, like in Google, they actually trained a model on a certain area and they started to ask questions in Bengalis and the model, without being ever trained in Bengalis, started to pick up on it and answer it because it had learned the structure and used the structure. So the technology should be able, from an LLM, which is based on the language structure, should be able to cope with the legal language structure, which is nothing else as a codified way of expressing. As much as the copilot, the LLM is able to work with code.

Because also there is a language. Or in my company, Marpelle, we are using it now to use a language to access the database. So instead of using SQL, we ask an LLM and they translate it into SQL, goes into the database and we get the answer. Now, the tricky part, and you started with this, but it was probably a little bit too quick, so let me do it again.

So back to this. Okay, so we have this amazing technology which starts talking legalese. So cool. But now the question is, how do we make this to money?

Which is essentially, how do you get paid? So what is the core idea of how the service you’re building currently using an LLM, what’s the value pitch here? So we are helping fast-moving companies and scale-ups, for example, to basically automate a lot of their incoming legal requests and automatically answering them. So for example, you have a sales team, you’re a business that has just closed the funding round with Jasper, and now you hired 50 salespeople and then you sell, And every one of those customers is complicated and they want their own NDA and they own their own data privacy policies accepted and you have to figure out, is that okay or not?

And then you have a legal department, which is not so easy to hire for, which is completely overwhelmed with all of these requests. And normally they would like to focus on actually growing, on automating their work, on making the business more sustainable, on reducing risks on like high impact things. But instead they have to work on all the things, like minor requests that are not even, that are usually the same. Like 80% of those requests are probably the same.

Hey, are we GDPR compliant? Hey, what happens if I sign an NDA? And so on. And we want to get this off their hands.

This is how we help them. And we help them with this with minimal onboarding time. So meaning you have data from the customer containing all the legal texts, and now you use an LLM to look into the text and answer questions towards this text. Yes, that’s the beginning of it.

Yes, that will be the first thing we can help with. Can I ask a practical question here? Yes. Thank you, Mr.

Professor. So Tom, and this is for the students actually, because I mean, we just described all this stuff a little bit. What I imagine is, I mean, these models scrape the whole internet. They have the whole internet network.

They have the knowledge, but it’s unsupervised. So they have kind of a clue what the content is, but not 100%, right? If I ask a legal question, I want 100% answer. So if I’m building a legal tech startup right now based on an LLM, and I use OpenAI API, and I ask legal questions.

So that’s my first one. How good is the quality of those answers? And then the second question would be, so what’s different for you? What’s kind of your mode based on that?

What’s different on the work that you have already done? Maybe you can share something there. Yeah. I mean, OpenAI sort of claims that it passed the New York bar exam with 90% or so.

I feel like that’s a bit like, that’s really unethical advertising, because you don’t, I mean, Lutz probably talked about this, but you seldomly test your model on the training data. I’m pretty sure it has seen all of the questions for the New York bar exam. So I just learned them by heart. And if it didn’t produce a good enough score, they trained on it again.

This is how I passed my naturalization interview. I just learned those 100 questions. I don’t know anything about US history. So yeah, exactly.

I would question if the model can actually apply that knowledge as well as it can reproduce it and abstract from the individual cases. So if in the case the person is called George, that might actually have an impact on the model. So if you rename all the people in the cases, discuss the New York bar exam, you probably would already get a substantially different score. That has to do with the model.

So a lot of what we do has to do with providing guardrails for large language models, making it controllable, making it adjustable, making it easy to use. And then we can make it sort of auditable by the legal experts so that they have an idea if this model produces wrong content because this can still happen, right? The data could be sort of cut up in a bad way and the model has a hiccup. It can still happen.

We cannot provide 100% accuracy. That would be a total lie. But we can make it easier to control it. We basically tame large language models.

But actually, we take a bit of its power away, a bit of it, but we exchange that for accuracy. Now, isn’t that… I mean, just from my… I don’t want to change your business model because you make money scaling up legal departments by allowing them to answer many different questions.

But the taming of the model, avoiding that the model goes up, constraining the model to the data and the data only, and not the… I mean, we joked in class about that senior partners at a consulting firm, they will come up with whatever sentence sounds sweet and nice and logical, doesn’t mean that it’s grounded in data. I love that. I was at BCG, so I can 100% support that.

Yes, yes. This is true, right? A model, right? It says something logical.

And I posted on Twitter how I easily can make a model hallucinate by just priming it correctly, and then the model will come up with that this course at Cornell is the best thing ever, despite the fact that it’s not, right? So, easy to do. But if you think about guardrails, isn’t that a whole platform task in itself? How do you deal with it?

How do you set your model of guardrails? How do you avoid that it is… Does something like the partner from a big consulting firm not stick to the legal text that was given? So, of course, I can reveal all of the tricks we have at play, but a lot of it has to do with what kind of data you present it with.

So, we have internal legal documentation that we use. We heavily process that in a way that it’s readily available, so that it gives you answers faster and better answers by not having to look through the whole material sort of life. And it also has to do with what you allow the model to do. It doesn’t always follow directions, and we’ve seen that, especially with like GPT-3, GPT-3.5.

So, the real innovation or the real benefit of GPT-4 is that it’s much better at following instructions. And the more steer you give it and the more you restrict it, the better. I mean, it’s still possible to break out of it, but that is just malice. That’s then a malicious request, I would say, if you do prompt injection attacks on these models or so.

But if someone wants more vacation time or something like that, I don’t think a court would consider this like a relevant answer. You’re a technologist. Would you say that is… Is this only about correct prompting to control the quality or is this more of a platform play?

It’s also a platform play. So, the secret sauce is also in the pre-processing of the data in order to avoid these mishaps. Mishaps usually happen because of missing data. Mishaps also happen if you presented the wrong and irrelevant data.

And so, I mean, you can, on our platform, you can just upload a thousand documents. Each has a thousand pages, right? And the information in it is completely sparse. And the trick is to find sort of the needle in the haystack anyway.

And doing that in an online search is really difficult. And so we have sort of pre-processing in place in order to avoid the large language model interface to sort of go off track. And that’s amazing. And that is something which is of huge value, not only for that business model as such, but just for probably you are thinking of like scalable solutions for the whole industry, I guess.

That would be nice. But so far, we haven’t seen any company, but maybe that’s just because it’s very early. I don’t think it’s an infrastructure… I mean, it’s always an infrastructure challenge, but a quality challenge to scale this.

I mean, I invested eight years ago in a company that was doing accounting automation. And over time, we figured out just to ensure the quality of the model. And that was very supervised learning. We had to start our own text advisory.

So the big question, I think you elaborate on that quite a lot, Lutz, is really how to maintain this feedback loop and make sure the output quality is high, especially in legal. There was a question in the class. John, tell us. Thanks for being here, Torben.

So it sounds like in your kind of your ideal customer is someone who kind of has a certain, you know, certain pre-knowledge on loss. Like they have a legal department. They kind of know what questions to ask. If I was like just starting out with a company or something and I like didn’t have that, and I like don’t necessarily even like know the legal questions to kind of come up with to start a company, like would you be able to help that company or is that kind of free you guys?

Maybe come back in a couple of months, right? No, you wouldn’t be our… I mean, we have a lot of people on our waitlist already and you wouldn’t be in our ideal customer profile, I guess. But sure, our sort of legal rooms that we present to our customers, they come sort of pre-set up with a lot of knowledge about the general law, about general regulations like the GDPR and so on.

But it becomes all much more powerful if you already have regulations for your company in place. And… Of course. The main thing is your business model is actually to answer on existing documents.

Obviously, once you’re being able to train a model to write for your documents, then you actually have the same ability as we do now at Marpey Health, where we have the ability to speak SQL based on human input. Now, you will soon have the ability to speak legal on human input. But at the moment, the first use case, which is the one that is being paid for, is you have the corpus of legal text. You ask a question and avoid stupid work for people just looking up the right text piece in order to make sure what’s in GDPR.

Like in my… Like whatever law, GDPR law, which I’ve written 15 times and I just need to look it up again. Right? That stupid work, we can cut out now.

There were two more questions. Bill, go for it. Yeah. So, within law, there’s so many idiosyncrasies, not just in the law itself, but context and similar with spoken languages like German, English, etc.

Is this able to also process, say for example, I’m an American doing business in Germany. My prompts are in English. However, the documents or perhaps even the law itself in a different language, being able to capture all of this, but then still bringing the specificity of language to have high confidence and kind of what is being brought out. Yes, it offers a lot of flexibility, but we don’t allow it to interpret the law.

So, if there’s documentation on the processes, it can abstract from that and apply to your case. That I believe it can do. I was actually like, we released a new model yesterday. It was a long night and I myself a bit surprised by how well it does that.

It also speaks foreign languages. I just allowed that for it with a disclaimer. And all that comes for free in an LLM. That’s the cool thing.

Previously, you had to find an API that translated all those documents from a foreign language. And now it sort of comes for free. I actually like Bill’s question and I would like to challenge this at all. Yes, at the moment you don’t, but let’s use long chain and kind of put this on top of it.

So now you use the corpus model you just published to figure out an answer. But, if you use another version of an LLM, which creates the corpus, and now you challenge, like with long chain, you could actually test it. Case law in the US or structured law in Europe, is the text I have created, generated, LLM creates new text for, protect my X from Y, whatever it is. You create a text and now you use your model to challenge it according to the law files you have.

Is it challengeable or not? And then ask again, redo a text. Is it challengeable or not? And you have a GAN like setup where you have one model create legal text because it speaks legalese, and one model checking it according to the laws and regulations.

Is it protected? Yes or no. Now you have two LLMs working against each other, GAN set up. And that should over time create your training set to improve your ability.

I know you don’t do this at the moment, but in order to get more investor money, maybe that’s something you put into your pitch. I actually love that idea. What I find fascinating about the sort of news wave of LLMs is that they can reflect themselves. If you ask them to do a task, they did it, they did it so-so, and you ask them, hey, did you meet all the criteria I gave you?

It finds the flaws in its own sort of thinking. And if you just query them slightly different way, things step by step, for example, it usually finds better solutions. And if you sort of have it review the document, it itself created, it can improve upon them. And this is really interesting, especially for open AI.

I wonder if they’re using this process internally and just like asking it a lot of questions and then taking it and making the final answer of the improvement process in order to find you in the model. I’m pretty sure they already do that. I just today read, however, read a disappointing article about that. It seems like this whole reflection business doesn’t make the models significantly better.

The quality that the models produce still follows sort of Moore’s law, even with this ability for self-reflection. But yeah, we are also starting to use something like that. What I believe is really powerful, adding on top of that, is if you combine multiple models that are specialized and integrate them using such an architecture that you mentioned, a long chain, it has this React architecture integrated where you have chain of thought prompting and you say, hey, you’re an AI agent and you have the following tools available. And then the AI model itself can choose the appropriate tool in order to do a job, like calculator, web search, things like that.

And it always comes up with a thought, hey, I need to do this sort of a plan. It then can select an action that it does, like a web search, and then it comes up with an action observation, with an observation from that action. And then it formulates a new thought. We’ve seen like, what are they called?

Chaos GPT, something like that. We’ve seen a lot of these models. It means AGI, CAMEL, Keros, I mean that’s like loads of them. Exactly.

And if you supply those with the right tools and don’t use it sort of as this grand experiment of solving everything, but if you actually give it the right tools in order to do much smaller and more meaningful tasks, I believe this can also improve the model quality significantly and give it a lot of ways to make an impact. There is another question here from I believe Andrew. Yeah, thanks for being here today, Jasper and Forman. So my question is a little bit more of a business question here.

I would just be interested to know what the response from the legal community has looked like on this, because I would imagine that people that are highly specialized, trial lawyers, litigators might really enjoy this. Whereas people that are specializing in legal research, you know, would obviously see it as a major threat. So what issues have you guys had to deal with? Are you going to fire 50% of all lawyers in Germany?

In the long run, maybe. So… Well, commons advice never say it this way. We supercharge people.

It’s actually interesting. I mean, we had issues with this before. People had questions, a lot of trust issues, even with a very rigid, like structured data models that we had previously before we introduced LLMs. Now with the LLMs, we don’t get as many like difficult questions regarding data privacy, regarding how accurate the model is.

And so on the people try it, see that it gives good answers and then want it. So we have a completely different reaction to this product than previous products, where there were a lot of doubts and questions where we host our data. And of course we are investing heavily into the security and data security, but the question is, how do we get the data? And the questions are, and the reaction is completely different.

Which is super interesting. And I think also here, the comparison with the senior partner from a consulting firm holds. You don’t question them so much because it sounds actually, even if it’s not grounded in data, it sounds logical and very appealing over dinner. And maybe to add one thing, when we did the research for LegalOS, it felt a little bit too early, also, by the way, in the US, all those law firms were partners and lawyers.

But then during the course of the last years, people started hiring engineers at law firms and started playing with tools. So right now you have the people in place that can actually explain what Torben is doing makes sense. And it’s really helping the law firm and they can also implement it locally because you don’t have those accentures working with law firms, not yet because the market is not large enough. So the adoption is actually possible.

They can play with those things. And now, thanks to LLM, you can actually experiment with it much easier than having a full blown low-code project that would build automation for you. I wanted to dig into… Maybe the question about lawyers, currently we’re not targeting them.

We are more looking at legal departments because legal departments are overwhelmed. They are working 100 hours a week just to get by. And they desperately want automation. And then they can see, they see how powerful what we produce is.

And how much it could save them and how much time they could spend with their kids. And I guess that makes all the other questions a bit secondary. I would say that lawyers and people in legal research would be slightly more skeptical. I’m still optimistic that I’d be able to convince them.

Before I move to the next question, there is actually one topic I want to look back to. The question about are you going to replace people or is it supercharged? So, the way you describe it, if you automate it, the human doesn’t need to do it, right? That means that a human is out of the job.

That was the only value proposition they brought. However, that’s one way of seeing it. The other way of seeing it is that human had way more other abilities. And it wasn’t their most favorite job to look up the contract and just quote 15 times the same phrase from the contract.

And then they’d get an answer to 15 different salespeople. So, by you automating it, you’re improving the effectiveness of that very person in the legal department. And so, therefore, I mean, it’s, yes, there is still like, and this is super important for our society overall, because what we see is we see a skill shift towards those automated tools. In the world from paralegals, which normally wouldn’t have, would be automated away.

There was one question from Severin. Yeah, I was wondering from a scalability perspective, how easy is it to implement or codify the laws of different jurisdictions? Like, for example, in the already in the DACH region. I mean, Germany, Austria and Switzerland.

Severin is from Switzerland. Welcome. I mean, the laws in Germany and Switzerland are quite similar, but there’s like teeny tiny differences, but they’re important nonetheless. So how do you think about this?

Actually, Torm, like when you answer it, dwell into a little bit like how this is set up, like geek out a little bit and give us an idea. Yeah, as much as I can. So when we preload Legal Room with sort of generic knowledge, you wouldn’t believe, first of all, you wouldn’t believe how many of these requests are not governed by the law, but by internal processes. I would say 90% of it is like how what measures do we have in place for this and that law to be to comply with this and that law?

That would be the question and not what is the law? Explain this to me or apply it to this specific case. It’s more like how do we do that? The customer wants to know.

I need to send it to him or her. And so these generic legal questions like what is the, what is the GDPR? How do I find out if my customer, if the customer we want to onboard is complying with the GDPR or what do you regulate in an NDA or can I do this and that? This would be just a tiny fraction of the of the request.

And we want to preload sort of these legal rooms with that knowledge on a specific topic. I would say you can do that country by country, country by country. Right now, we’re focusing mainly on 234. But you can do that country by country and have that specific law.

I would say it’s quite relevant usually for at least the country company that we are targeting right now. It’s not really relevant what the law is outside of that specific country. Usually, in those cases, it would become relevant for the legal department. So this is still something that the legal department, would have to take care of them.

But you can upload all this knowledge. But like putting in all of the knowledge of the world about every single jurisdiction, I believe at the moment is still a bit out of scope. Well, actually, the scalability question I would answer like slightly different because essentially, you said it, I upload it to a room and I put guardrails in there only, the model only uses the data in the room. I’m not, you know, in AI, we always have the discussion how narrow something is.

I build a computer to beat you in chess. I build a computer to answer on a document store I’ve uploaded. So the scalability becomes what you upload, whether it’s internal processes, legal documents, whatsoever. You do not go and say, I want to have laws in the US answered.

No, you send like a heavier document. These documents are what you should answer about. And now I have a legal part over it. All the discussions from us are abstracting and becoming the legal speaking entity, which you said, I want to, I just want a law, like I want a contract.

And then it writes everything by itself. We’re not there yet. At the moment, it is just like the interface. And Jasper and I, we talked a lot about it, like LLMs being a good interface.

It’s the interface to that data. I have one last question. Then we probably should give you guys your Saturday afternoon, evening. Like, is something about your structure, because I know you use a database and I want to ask Jasper, Jasper, there was some news about Pinecone.

And I thought maybe like you, Jasper, can tell us a little bit about it. And Torben, you can explain what is Pinecone and why is Pinecone for you. So that’s important. So I would say, Jasper, when you look currently at the trends, obviously, it’s a lot on applications and experiments, adoption, more on what users can do or consumers.

But what is really interesting to your point or to the previous questions is how to scale this, how to make this efficient and productive. And there is the topic of vector databases and Torben can very much better than me explain what that is. But there was a lot of funding going around hundred million now from Iconic. That was the former family office of Mark Zuckerberg.

And Andreessen Horowitz, for example, in Pinecone, that’s, I think, the one you mentioned. But there are many more like Quadrant, Zillis, Chroma, I think, Trend. Sorry, I think that one got also more than a hundred. So people are realizing it’s not just open AI that you will use.

But when you do your own models, you might want to use proper infrastructure for it. And I guess Torben can allude on it. Yeah, it’s a gold rush and people are desperately trying to sell their shovels. And Pinecone is actually, yeah, it’s a vector database and it stores.

It’s important if you want to store lots of embeddings. So if you want to retrieve text by their semantic topic, and that’s what you do. If you research in your knowledge base or something like that, it’s not structured data anymore. You don’t look it up by keyword or by ID or whatever.

Instead, you look it up by its embedding. So you embed a query, a text. And you let it work. And you later retrieve it from the database.

If you have a certain topic, it looks around that embedding of the topic, basically, for relevant documents. And then you can do further processing in order to even reduce further, like figure out which of these documents is really relevant. And yeah, Pinecone, I’m honestly not sure. I think it’s a quite complex problem of making that scale to billions of documents.

But it can work. I’m quite positive. It can be distributed quite well, like we’ve seen in other databases. And there are already open source solutions.

And every major cloud provider has one of those. And they are, of course, highly relevant at this moment. I don’t even have a favorite because they all work the same way. I personally wouldn’t invest my money in one of those because they feel a bit sort of replaceable.

But yeah, they do. Yeah, they just make it very easy to quickly find relevant information to feed into your large language models. That’s the whole point. Nice.

I want to wrap up and just like for us here as a class, it’s an amazing time to be in. And what you just saw in this discussion, you saw Gordon who takes a very specific topic, legal answer questions over legal documents. Taking and scaling up legal departments. And because it creates a value, I get money as a startup.

And why can he do it? Because we have the LLMs, which are essentially the interface, the human, easy, adjustable interface towards the legal text. Then we touched about it that he says this is all cool as a value prop, but guardrails are actually important. So he is creating a lot of infrastructure to create those guardrails because it’s not just like I have a text document and let your LLM work on it.

There’s hallucination. There is like code injection. There could be many mean things happening. If I have my LLM talking to my database about my claims data, I actually need to make sure that not somebody says, can you tell me what psychological problems Lutz Finger has?

That would be bad if somebody can just type this in. So you need those guardrails. And this is a whole task in itself. And there might be another technical solution on the way.

And then we talked about infrastructure play in order to empower all of this because it’s text. It actually needs a different infrastructure. And that is where you use those embeddings in a vectorial stage, which is very close to the discussion which we had here in class, vectors, distances. And now this is scalable.

Thank you, guys. Bye. Thank you so much. Thank you to Jasper.

Thank you to Torben. Thanks for coming in on a Saturday afternoon.