Bias in, bias out: Addressing the impact of bias in AI — Transcript

Now, Jasper, let me ask you something. Knowing that we have skews and biases in the data, knowing that there is even a technical explanation to it, but this is a problem, knowing that this problem isn’t easily solved, what does it mean for startups? Yay! Yep, another podcast!

Yeah! Hello, Lutz! How’s your coffee? Coffee is good.

I’m actually pretty impressed by this title music. Wow, you got the groove, man! Yeah, it’s AI. It’s AI, but it’s unfortunately the same as Midjourney.

You have to play a long time until you get something. Yeah, I didn’t have coffee. It’s again evening here. Are you still in Canada?

I’m still in Canada. Very good. So we got a Canadian perspective for today’s podcast, which is actually quite interesting because we heard a lot of things in the past weeks around people want to control AI, regulate AI. Even Sam Altman was speaking publicly in front of politicians and admitting if this goes wrong, it can go wrong really badly.

So asking to be regulated, which sounds interesting from an American perspective at least. And the EU is going to regulate AI, at least that’s what they tell us, by the end of this year. So pretty fast for the European Union. But what we ask ourselves and probably many people are asking themselves is, why do you actually have to control AI?

What’s going wrong there? What has gone wrong? What has gone wrong? What could possibly go wrong, right?

I think the good news is nothing of the scary stuff which we have seen is new. And what goes wrong with chat GPT or large language models, which as a reminder, we always say, it’s a very neat interface. What can go wrong is the same thing which goes wrong with so-called narrow AI. And that is skew biases, which are reflected in the model.

And we should talk about it. And we see that happening with language models as well. And therefore, the more people use it, the more dangerous it is. And I think most people will understand what a bias is, but what’s the difference between skew and bias?

Is there actually one? There is one. It’s because there is a certain amount of data over-representative bias is that this over-representation is actually not true. So, but let’s give examples.

Mid-journey is extremely good with generative AI to create human faces. And they’re all smiling. Smile is actually very American smile. And if you create old looking photographs, World War II photographs, they still will have the same face.

So it’s a smile of nowadays American smile. Like, so there is a skew in the data, which makes, generates one type of smile. That’s not too bad. And you probably can correct over time, but there is no cultural sensitivity on the smile, for example.

Now you can make the joke that mid-journey is even able to make Germans smile. So we should really try this out. Picture smiling German. We do smile sometimes.

Yes. Now this could be actually sometimes those biases that we have to make consequences. So let’s take a narrow AI. One AI, whether somebody should get out of parole, meaning leave the, the, like, like somebody is put into jail and can get out of parole or not.

There is a process where you go to the judge and make your case. Now judges started to use computers. So first of all, judges are as well biased and they try not to be, but no human, like it’s human to have a certain bias in. We try to take them away.

However, if you look at it, it’s less likely that you get out of jail on parole. If the judge did not have lunch. So after lunch, you are more likely to get out than before lunch. Yeah.

I read about it. That’s, that’s pretty scary when you get your appointment and you’re like, oh shit. Yes. So the idea to actually use a computer and a machine learning program seemed very good.

So give a computer, which creates the likelihood that something bad happens once you are outside of jail. Right? How scary is the person who is in jail, like for society? And is it okay for us to release it or not?

But in the awesome computer to do this. Now, the problem is when the data feeding this computer is biased, for example, towards a minority, for example, towards black people, then the actual model is biased and will keep more of one type of minority in jail compared to the rest. I think there was also even in the early AI models, IBM Watson, sorry to mention IBM Watson here. Obviously they made it public and fixed it, but they used medical data in very, let’s say, privileged hospitals in Boston.

And that was mostly white people. So that’s why there was a bias towards white people, which makes total sense because you feed that in. Is that an AI problem? Sorry to ask this off schedule, but when you do statistics, regression analysis, it’s the same problem, right?

Shit in, shit out at the end of the day. Yes, that is well said. BS in, BS out. Absolutely.

It’s a problem for humans. We get trained on like those tiny clues. We react to tiny clues. We react to thinking fast, like the idea.

And obviously computer is based then on the trainings that are for the same effort. And yes, it’s a problem which we have in society. So for example. If you want to give a loan application and you have a narrow AI to give a loan application, but there are very few women in the loan application.

Only you have a skew in your data because you don’t have enough women. Now, how do you, how do you fine tune your machine learning program or your AI to be correctly reflecting this setup? There are many ways to do this, but you have to be. Aware of it.

And I think all over the place and we see it so many times. And I think that’s why, and we don’t want to talk too much about regulation here, but I liked when I read about the whole regulatory development that the Swiss people, Switzerland actually said, guys, we have enough laws in place also against, yeah, against let’s, let’s say biases. So we don’t need special AI rules. We also, yeah, we, we take care of our courts.

We take care of our selection processes. So. We don’t need extra rules for that. Might make sense because if you regulate the AI, you have to regulate normal statistic.

You have to regulate human decision-making more. Now, what I think makes sense is you do not want to have biases and prejudices in, in your processes. Yes. But if you give those over to machine, then they get enforced in a larger scale and we need to, we should be happy that we have AI because suddenly we can actually look at.

Scale. Where those skews and biases are. Give you another example, the stop and frisk program. And I wrote about in one of my Forbes articles about this, the stop and frisk program in New York was a program where you stop somebody on the street and search for contraband.

And you don’t need a reason. You just stop them and search for contraband and, and weapons. And that program, the precinct, the area where the police controlled was selected. By.

By a computer. But that computer was very racial unjust. Right. And then it becomes a question how you deal with it.

And we should talk about it. In this case, it took out color. My article essentially showed like skin color. And my article actually showed the, system became even more skewed and more biased, which is very logical.

There’s a very logical data science explanation to it. Yeah. And I think the other part is then it’s you’re predicting something. That’s also what we spoke about it.

We talked about it. That’s what AI is doing. So again, maybe bias in bias out, but also the data you feed in is the data that is defining the outcome or the likely outcome. So I think a very good example, if we just look at the stock market in the past years, which always went up, obviously I’m predicting that it will continue going up.

There was some leading indicators where people said, Hey, it will turn, but then more money was pumped in and probably also nobody was. Predicted. Nobody was predicting COVID. Nobody was predicting the Ukraine war.

So yes, you are kind of, you can predict something, but if you don’t have all the data, which you will never have, then it’s just a prediction. It won’t be the reality. Totally. And I think the, so actually, by the way, I’m closing the door for a second.

Normally it isn’t checking the editor. This is a very special scene that we have to keep and we should use this for social media. Okay. That like.

Makes a lot of sense what you described here. I think there are two, like the two areas. One area is we have data which reflects our society, but it’s not correct reflection of how we wanted actually the model to work and racial or minorities, like actually racial views are one of like opinions. Yeah.

Like opinions are one of those areas. The other part is. If there is not sufficient data because a minority doesn’t get reflected. The third part is actually your data is all correct, but your label, what you focused on is wrong.

And you had this in the example on the program, which tries to select in healthcare, who should get certain treatments. Yeah. If the label is how do you reduce cost as an only label, then you tend to offer treatments more to. The rich communities, which very often then in the US will be white in this case, because they have created the highest cost because they are contracts are the most costly and therefore suddenly all the focus will be on those communities instead of the communities, which might need it as urgently as the other communities, but their overall cost pool has been smaller.

So it’s not about so much about how. Effective is. Treatment. It’s more about where do I save costs?

Because this is what I set the model to. Yeah. But let’s shift gears a little bit. So what we described is bias and skew is an issue for narrow models.

And I want to highlight still, we haven’t talked about AI yet and we will, but this happens with everything where you use predictive analytics, where you try to predict the future. This is not just AI. Well, actually it happens with everything. Yeah.

Yeah. It happens with like Obama in his last few weeks in the office. Obama had been very careful talking about any racial discrimination topics. And in the last few months in office, he actually addressed it.

And he said, like, he grew up in a world where he has been faced with that. Like he and he knows the feeling that he enters a bus before he became the president. He enters the bus and this older lady. Just drags her purse a little bit closer to the chest.

It’s a very small gesture, but, and it’s probably not even meant to be a racist movement, but like in her mind, there was an indication which made her do this. And I think we humans have the same problem. We are based on data and that data might be skewed. Now the good news about AI, we can make this actually more apparent and we can better control.

Yeah. Let’s talk for a moment, large language models. So what we discussed, we discussed healthcare, we discussed loans, we discussed the time, right? So all of this now let’s take a large language model, large language model.

We always said is an interface. And as an interface, you create the most logical next sentence. Now that can have all types of problems. And I also expect a certain output after my question.

As an example, and I don’t want a long explanation. I think that’s very important. It’s a very short output that I can comprehend in a short amount of time. Yeah.

So for example, people were testing those large language models with three Muslims went to a bar and then dot, And the feedback they got was very often more violent. The sentence ended more violent than if they would say three Christian went to a bar. So there was in many of the models. We see an anti-Muslim sentiment.

And it was trained on the whole worldwide web, if I understand correctly. So it’s. Well, I mean, it’s like you never train on the whole world. Not on the dark net.

These models are trained on massive amounts of data. So now the problem is, oh, so the data is skewed. That’s the reason that the model is. Yes.

Because the data was probably created by people who were skewed or like had a certain world view. Yeah. And this data is more. And so it becomes actually complicated.

Because we know for a fact that at least some of those large language models have had been fine tuned by human beings to actually avoid those kinds of biases and any very, very bad world view. But still you get it. And we can talk in a second about how you avoid it. But that is one of the typical problems you see in large language models.

In generative image modeling, it’s the same problem. If I say I would like to get a picture of a dragon. Yeah. And I would like to get a picture of a nurse.

Like what type of nurse do I get? And you know, you kind of can test this out several times. And in those image generation models, they are skewed. And you will get more female white nurses in this case.

And again, we need to figure out what to do with it. Also mid journey shows mostly beautiful people, slim and beautiful people. That’s what also was criticized. Yes.

So we talked about the actual issues. And we gave a little bit an idea that the underlying data, all the underlying label is either biased or skewed. Right. Now, what can we do about it?

I think one thing to start with is definitely understanding what’s happening in the model. How do we show that there is bias? Well, I mean, we create 100 images of nurses and then we count how many are men and women. We run a loan application 100 times and we figure out how many loans get accepted from men and women.

We take all those features which should not have an impact and calculate whether they really did not have an impact. Interesting part, this is nothing new. Science does it all the time. If you do a study on whether medication works or not.

Now in science, it’s nothing new. Right. In science, we use already features like gender or other features. Which should not have an impact.

And we run them through the model to show that they don’t have an impact. So the same we do now here for our AI model. We actually trying to figure out what has an impact and what’s not. Yeah.

That in itself gives transparency. I think the simple ones were the moderating and the mediating factors. I think that was one approach. And obviously then you can do several tests which are different for AI.

But a lot of people are talking about explainability of those models. I mean, it’s nice that you test for biases. But I think the core question, especially for self-driving cars and but also for others, obviously, why is the model actually doing that? Can I control this?

So explainability is a huge research area, but it’s not solved by far. I have a very interesting blog post on like Forbes article on why Angela Merkel is a boy. There is Microsoft at one point in time. You would feed them a picture from Angela Merkel and it would spit out a boy wearing a blue shirt.

And the I could not figure out what triggered the model to actually think it’s a boy. My assumption hypothesis is it’s a campaign photo. It was a company like she was 65 years old at that time, but had a skin of a 14 year old. So I think like there was some technical prep done on that picture, which kind of leads to the question.

I think it’s a boy. But it’s very hard to actually see it in the model because the whole point of deep learning is that we allow the computer to decide what data, what structure to use. Meaning it’s out of our control and therefore we cannot really see it. What we can do is we can make it apparent.

How many people smile like Americans? How many women get accepted? How many white people got the treatment versus some other minority group? And so it’s a very interesting question.

But then for the actual users of the model or the creators of the model and people like founders and others who want to apply the model, this wouldn’t be helpful enough. I know that my model has biases and is spitting out. Yeah, even spitting out skewed data. But then I obviously want to change it.

So can I put rules in my model and say, don’t do this, don’t do that? So the answer is yes, that’s how it is done. But there are there are three approaches, essentially. Okay.

You can help the user to understand that there might be a potential bias. You could help to fine-tune, and we should talk about fine-tuning, prompting, and retraining in a second. So you can actually help to change the data, or you can put in guardrails, which is often called guardrails, constitution, safe rules. There are many, many ways of doing it.

And we can go through all of those three. So help the user. That’s a UX term. Is it a UX term?

Because it sounds, sorry, my first reaction was, oh yeah, I get this little number. And then it tells me this model is biased, like terms of service that you just scroll through and accept. And by the way, there are 100 million biases in this model, and nobody reads it. So that’s how they would solve it at the end of the day.

I hope they’re not 100 million biases. There are certain biases, and we need to know that there’s biases. So if somebody says, I want an image of a nurse, then one UX approach would be, it changes your input and says, I want an image of a male nurse. Yeah, it should ask you, right?

That’s what you mean. It should ask, what kind of sex is it? What kind of, let’s say, racial background? Yeah, that makes sense.

Or it makes an assumption for you, what it thinks you want and spells it out. The question then for me is, it’s a little bit like when I’m prompting Midjourney or MusicML or other tools, does it actually get the input? I mean, we spoke about this attention topic or the tokens. Does it actually get this input and can fine tune or make the output more specific?

In this case, it’s actually the model knows that there is skew for those areas. And it randomly, based on a certain set of percentages, gives you the input, like 50% of the input. 50% of the nurses should be male nurses. 50% should be female nurses.

Actually, good question. Should it be 50% of the male nurses? Is 50% of the population male? Yeah, well, for the whole general population, it is true.

But if you think about nurses, it is more dominant with women. And therefore, a good question would be, should the model do 50 or do the actual percentage? And how does the percentage look different in Germany? Versus in China versus in the US?

Plus, we’re assuming perfection here, right? Precision recall, first, second order mistakes. Does the model actually know what’s right and wrong all the time? This is a…

So remember, there are three steps. First step is identify. Okay, we get all female nurses because probably our training set was done. Second thing is defining UX approach.

So we have a set of rules that we want to change. So now you actually, via a rule, if somebody says nurse, we change it into female, male nurse. And that is the rule set. And that is meant to make the user aware of that potential bias and allow the user to be specific.

The third approach would be to correct the data. Now, there is a lot of work in the world of how we work with data and what potential data sets could be, right? So if you have a model today, you can use prompts to actually change the outcome of the data set. So, for example, you can prompt your large language model.

You ask the large language model about complete a sentence about abortion and you prompt it with I’m a Republican versus I’m a Democrat. It will give you different outputs. So you actually prompt the model differently. Now, if you have three, five examples, prompting is exactly what people do.

If you have more, let’s say, ten data, like a data set of ten examples, and you probably imprompt tuning, meaning you work with the model to actually improve the prompt, if you have 100 examples, then you’re down to fine tuning. You’re actually trying to use that data in order to train the model of what you think is correct. Now, what we have seen is prompting helps to reduce bias and prompt tuning as well. It doesn’t change the underlying problem that the data set which went into it was biased.

That was my point, right? I mean, with narrow models in the old days, or maybe we still have some of them out there where we had supervised learning, it felt a bit easier because I would get with a good UX, I would get customer feedback. As an example, we spoke with Retu from Ultimate AI. So the customer would say that’s the wrong answer.

Choose this answer so I can directly input the data and change those biases or implications in the model. Now with these large language models, because they are so large, it feels a bit like it’s like a huge ship. Well, I’m saying, yeah, there is an iceberg. And now we all have to agree that this is an iceberg.

Exactly. Let’s say you use for your… We talked about how generative AI works, right? That you use images and descriptions.

Now, if you use public data, you’re probably using Instagram. Now, suddenly you have a certain type of user base who has a certain type of images they do and that has a certain type of ways to describe it. And I don’t want to perpetuate any… Well, people are using Instagram.

I think they know what’s going on there. Exactly. So, you know, then so that kind of creates a skew. How do you get this out?

You cannot so easily. That’s the reason why you have to prompt it. You have to fine tune it because to change the whole, the data set underlying is actually hard. Now, the third approach which we saw, this is like the third approach will be on the data.

The fourth approach which we saw is actually… And it’s like I talked about three, there are actually four. There is identified, there is mitigated by the user, there’s changed the data or prompted and the last one is create those guardrails. And these guardrails, the idea is kind of give the computer a constitution.

Yes, constitutional AI. I think we… One model, at least that is publishing this a lot is Anthropic and that chatbot Claude. So there’s a lot and they make it very public.

They train it on the UN Charter. I think they even included some terms of service of Apple. They have their own research around it, but very publicly trying to also discuss a debate about what is right, what is wrong. But I think that’s one challenge.

Who defines what is right and what is wrong? Totally. They say this actually in the publication. All right.

This can be used as well in a negative sense. By the way, our podcast with LegalOS, he talked about it, he called it guardrails. Doesn’t sound as sexy as constitution, but it is actually the same thing. You create guardrails to saying, does this make sense?

What you just said and let the AI critique themselves. Now, Anthropic has described this very neatly. The traditional, like the generative AI models, which we saw used human inputs to actually say, does it work or not? So they got trained reinforcement training through humans.

Now we use an AI to control the AI based on a set of rules. First of all, do we get our rules correct? And second of all, are our rules complete? These are two different questions which we need to ask ourselves.

To come back to the example of Republican versus Democrat, I can get a rule down or prompt down to what does a Democrat wants versus what does a Republican want. And based on that, my model will be biased in answering to me because the model will answer to me what I want. You will probably get a very surprising answer in Germany if you ask a Green Party member and a Conservative Party member about nuclear power. And it might surprise you here and there.

But yeah, that’s a very good one. And I think also your example about Democrat and Republican shows that you might not even be able to use that model in France because it’s a very different party system and the opinions or what this data is might be different. This is actually an amazing good point because we are talking about a human to computer interface. Large language models are human to computer interfaces.

If I want to take a certain decision, I probably will use an error AI. If I want to have an interface explaining something to me, following logic, logic is not the same everywhere. We have cultural differences in explaining things. Therefore, our interface will change as in any human conversation.

So by saying there is bias, it’s actually good that we have this awareness that there is cultural differences. Yeah. And I think also the last one, getting inspired by AI. We spoke about generative AI, music, for example, it will reproduce what it has heard or seen in the past and will combine it.

So that’s kind of inspirational. But then still the human being talking about the stock market again, might be still able to process more data or just be creative, really creative and just say, hey, I draw a totally different picture because I think that might be cool. I just recently saw the ads Apple got for their computer. There was this thing different as an example.

And at that time you would sell a computer by megahertz and speed and everything. But Apple just said, hey, no, we don’t sell the computer. We sell the people who are using the computer. And I would never come up with that.

It’s just reproducing the past. Now, Jasper, let me ask you something. Knowing that we have skewed biases in the data, knowing that there is even there is a technical explanation to it, but this is a problem. Knowing that this problem isn’t easily solved, what does it mean for startups?

I think there’s a lot of chances because one is, first of all, you have to make it transparent in a way, you have to make it applicable. So there will be startups. We spoke about Arise that will basically show you what the biases are, what your models are doing, they’re getting transparency in there. We had observability tools when it came to cloud usage, Datadog, a strong pioneer, a large one.

And we see now the same happening with AI. So I just want to know what’s happening and I want to be in control. People love transparency and control. I think that’s one big one.

The next one is how can I make AI applicable in a safe way? So not just having the transparency, but also the guardrails. I think we had a nice discussion with LegalOS, but also Ultimate about if I understand what is okay to do and what is not okay to do, these kind of rules, then if you can guarantee me in a way, and I think that’s what many, at least enterprise customers, you spoke about the narrow models, but that’s what they would ask for. How can I make this secure?

How can I make it save me from any lawsuits or actually get a result that I want? So I don’t want the bias or maybe I want deliberately, I want the bias because I want to target a certain segment of customers, not all the customers, but just a certain segment, but this kind of controlling the output will be very, very important and this is where startups can actually do a lot of work. Yeah, for me, super interesting to see now healthcare, a huge topic, has been always a huge topic for Nara. I think it’s a big topic for AI and now for large language model as well.

But biases in healthcare are extremely bad, right? So because the outcome or the impact on the human life is direct here. So let’s see how the industry is tackling those. There should be a lot of very interesting new developments beyond the horizon to actually show how biased is the model and be careful because it’s biased.

I mean, we have a lot of applications where at the end we won’t care. I mean, we spoke about biases and some social media. Also, when you look at commerce, yes, I get some recommendations on Amazon, on Zalando, on Zappos, and they are definitely biased. So what will the AI change here?

Hopefully more transparency and maybe even better results, but you probably don’t have to control it that much as for others. But I think the creativity in consumer will actually be very helpful. It will give more free room for the AI. To develop versus when it comes to enterprise B2B applications.

And this is again, it’s an opportunity for startups where you have to control this more. Absolutely. Absolutely. So what we are looking for, like what we think we will see is the rise of those controlling mechanisms, those guardrails, those how do you write constitutions and how do you control for constitutions as platforms?

Or will it be ingrained in every business model? That’s what we will also be looking for. And what we also see is people being more transparent about what the AI is doing and what not, which is good because at the end of the day, it gives us the consumer or the customer, B2B customer, more transparency. But we think the debate is very, very helpful.

And it also gives room for smaller companies to actually use this and build their own business models around it. I think that’s the transparency in guardrails. These are the two topics which I hear from every founder, I hear from every AI toolset. Everybody creates transparency and everybody creates guardrails.

So whenever we are in a market where everybody is doing a step, that’s the moment where we will see a platform or some aggregation coming in. Or the picture is you just bought a new, wild, very good horse or amazing new supercar. But you have to learn how to ride it and maybe that’s what we’re doing. Right now, learn how to ride it.

Good.