By Dylan in Podcast — Mar 3, 2023

The Future of AI | Ingmar Schuster and Bob Krier

Ingmar Schuster is a machine learning expert and the CEO of the biotech startup Exazyme.

Balloonary Podcast - The Future of AI | Ingmar Schuster and Bob Krier

Introduction

Bob: So Ingmar, it's really nice that you had the time to join me today. In this new series that we are just starting today, we are interviewing people from various expertise fields. We will focus on technologies that are related to what we do in Balloonary, mostly AI. And, so [my Co-founder] Dylan immediately had the idea to invite you. So you're the first one, and, well, that's really cool. I checked your background and I saw that you have a quite impressive track record from both, studying as well from your work experience. And I saw that you studied in Tubingen and also in Paris. Maybe you can give us a bit of a background on what you studied and why you studied machine learning and AI to begin with. And also that early on. When was that, like 10 years ago, no?

Ingmar: I did my undergrad in Tubingen. That's right. But, for whatever reason, I never did any machine learning over there, which is a big pity knowing that Tubingen is a nest of many, many great researchers and practitioners in machine learning these days. Also, a lot of practitioners, because Amazon has opened up labs there and opened shop in Tubingen. But I didn't do that over there.

Rather, I did a diploma in computer science and another diploma in linguistics, and that's what I started my PhD in. So basically in my PhD I was working on a statistical or probabilistic model of the meaning of human language. So the idea was basically the fact that you can say green grass, but that you're much less probable to say a green dog or green chocolate and that this carries meaning. But my models didn't really work and I blamed the statistical inference method, which was a numerical integration method called Monte Carlo for it. So I started working a lot on improving Monte Carlo methods in the second half of my PhD, which got me to Paris.

And after Paris I did a second postdoc in Berlin, where I started to work on kernel methods, which some people might still know from support vector machines, which are still very successful today when it comes to simple classification problems. And yeah, at Zalando of course, there were also deep learning methods that we worked on and that's basically it.

Zalando uses AI for their Ads

Bob: Right, I saw that you also worked for Zalando. That's really interesting. So, in what context did Zalando need your expertise?

Ingmar: Zalando is applying a lot of machine learning for all kinds of things, as are many other e-commerce companies. Most of the time you're trying to predict the future because it helps you in logistics. So, you're trying to predict which product will be needed where, and you can stock up on it, stuff like that.

But also actually in terms of ads. Everybody's trying to get the most out of their ad dollars, so everybody's trying to apply machine learning. And the e-commerce companies, of course, do that as well.

Bob: Okay, cool. When I started studying hydrology we also used the Monte Carlo method in simulations. But back then I always considered it a tool and not a topic of research itself. I wonder, where did your interest in machine learning come from, what motivated you to go there?

When AI wasn't the "Next Big Thing" yet

Ingmar: I mean, the problem that I had in my PhD that I picked out for myself was just fascinating to me. Modeling the meaning of human language.

And then it was also very clear to me that I didn't want to do it in a theoretical way. I didn't want to do it in the way of discussing with other researchers, which was what many semanticists, basically linguists that work on meaning, were doing. They were writing research articles, which flashed out what they were thinking about certain parts and then other researchers replied and nobody was putting any numbers to it.

And I wanted to make it falsifiable in the best scientific tradition, basically by really putting numbers to it.

Bob: I remember when we used our own models to predict river floods and tried to predict how systems behave, we always had these problems, where we said we get the right results, but for the wrong reason.

Statistically you can have fantastic numbers, but you are actually explaining the system.

Ingmar: Yeah, that's true.

I mean, you can say on the one hand this is bad if you want to gain scientific insight. On the other hand, if, for example, your goal is to engineer something and you can statistically show that you generalize well to new data, then you don't care whether you have the correct explanation or whether it simulates a system well or not.

That doesn't matter.

GPT is extremely naive

Bob: That's true. Now, even though GPT models have been around for a couple of years, we have only seen this big hype for ChatGPT in the last two months. It really went viral and has more than 100 million active users now. Which apparently has made it the fastest growing application of all times. Being from the field of machine learning, did you expect this to happen, have you seen it coming?

Ingmar: I did not, even though the GPT type of models are exactly what I also have been working on in my PhD. Through probabilistic models, they model language meaning as well, because these are not only syntactically correct texts, but they also are coherent and they answer questions and you can even make causal inferences from this. For example, you can ask GPT: Does age cause weight or does weight cause age? And GPT will give you the correct answer. And the same thing also from causal inference would be: Does height in the landscape cause temperature or does temperature cause height? And it will give you the correct answer. And so this really already captures meaning in language.

But I did not see it coming. No, I've not been working on this kind of stuff in the last, let's say seven years or something. It is really astounding because when you look at it, the way of training GPT type of models is extremely naive.

It's really astounding that it works so well.

Is GPT a black box?

Bob: Some people say that it's not completely understood how it actually works in detail. How the GPT models come to the results that they produce, either correctly or wrongly. That the process is not always a hundred percent understood.

Ingmar: It's true that you can't necessarily trace back why GPT is spitting out a certain text. But technologically speaking, it's completely understood, of course. We know exactly how these models work, it's no magic. People have built this, so people know exactly which parts go in and what happens after certain text is entered as the start for the GPT model. How the GPT models embed this into a mathematical space and how from the mathematical space it predicts the next stuff. So this is completely understood.

Anyway, so many people tell me, AI is a black box. It's not really a black box for me, because after reading the papers, you know how these things work and you understand the wonder and marvel that they work so well. But sometimes I'm also still astounded by this.

Predicting proteins with language models

Bob: But that's really cool. But you are not using language models in your current work, if I understood correctly. You created a company in biotech, Exazyme, and there you use other machine learning technologies, right?

Ingmar: In part. So in some sense, you can also consider what we are doing now as a language model. It's just a matter of terminology. It's a language model in the sense that we are embedding sequences in our case also of characters actually, but it's just 20 characters the building blocks in biochemistry. They symbolize amino acids. And there's 20 amino acids, so there's 20 characters, one for each amino acid. And we design proteins for industrial or pharma use. So for example, proteins are used as catalysts for chemical reactions. These proteins are called enzymes and what you typically want is to make these enzymes catalyze the reaction to be as fast as possible and you want to change the sequence to make this happen.

Bob: Okay, fascinating. How did you come to that kind of application?

Ingmar: We were searching for something to build a company around and we had a few criteria. One of them was that AI had to be a unique selling point. So you should be able to get a strong edge with AI. And at some point I talked to my co-founder, Phillip and said, well, maybe we can do something that's worthwhile for society as well.

We started looking a bit into precision farming and that kind of stuff, and then started to talk to a research group that was engineering enzymes. These enzymes were used to catalyze an electrolysis reaction. Their goal was to make these enzymes as fast as possible and that's basically how it started.

Bob: I guess you did it the right way. It's always good to talk to potential customers to understand if what you build is actually going to be used.

Ingmar: Sometimes you have to jump in, I think, because, like Henry Ford said: If I had asked people what they wanted, they would've said faster horses.

Bob: Yeah, that's also true.

Ingmar: You have to build something. If it's far sighted enough, then you have to build it and try out whether somebody wants it.

The danger and potential of AI

Bob: I guess so, yeah. As someone who loves new technology, I sometimes notice that I am both excited and concerned about the rapid development of GPT, and I wonder what can we do to balance the potential risks and benefits? And how will AI change the world in the next 10 years?

Ingmar: I think GPT is nice and it will help doing a lot of things a lot faster, a lot of everyday things. But it's impossible to foresee what will happen exactly. I wouldn't be a machine learner if I didn't say that. Extrapolation is super hard, and I think everybody says that. Even the people that have worked on GPT wouldn't know what you would be able to do with it. I'm not sure. I think one wave that is still coming that will not be consumer facing is for sure the stuff that we and other companies are working on. So AI that helps to design chemistry and biochemistry way, way faster than it used to be done. Even today we can see that GPT-like models can design completely new proteins from scratch. Well, maybe not completely from scratch, but enough to justify the terminology NEW PROTEINS, that work like in nature. And we've also been able to show that we can create proteins that work much, much better than nature and do it way faster than anybody ever has. So there was a noble price on a certain protein engineering technique in I think 2007 or 8. And we just beat this technique very recently. In terms of what we were able to do, this is compounding. I think it's to a large part exponential growth because you get to good points faster. And then from these good points, you again can iterate and get to the best point also much faster.

So basically you have exponential growth and will be able to play God more or less. You will be able in, I think, 10, maybe 15 years to snap a finger and have a molecule that binds to a certain type of cancer.

And this will be possible for sure.

Bob: I heard rumors that there are internal meetings at Microsoft to present GPT4. And everybody is expecting it to come out this year, or maybe next year. But beyond just larger and more efficient language models which major leaps in AI do you predict will be coming next? Especially what do you think will be coming next for generative AI?

Ingmar: I mean, what people already have published and what already has made the media is automatic generation of videos. Of course automatic generation of 3D objects is also just based on prompts. So I think media creation definitely will be easier for folks that want to do basic things.

And then the question is of course can the general public even tell the difference between what people have created and what AI has.

Bob: I recently heard about a tool that allows you to just input text and have it read back to you with a voice that sounds so natural. I couldn't make out the difference to a human.

Ingmar: Yeah, yeah, of course. It's very natural. For sure. So I think all of media creation is definitely getting much easier in some respects. In other respects, it will still not be so easy. And also with GPT, it can't do everything for you.

But certain tasks are just nicer and that's great.

Bob: And I think we have to completely rethink what our job will be and what the job of the AI will be. Which value we will bring and which value will come from the AI.

Will AI make us lazy and rich?

Ingmar: For sure, that is number one. The other thing is, of course, uh, also how do we compensate? Because right now, AI is like every other tool. It makes it possible to make much more money from the same amount of invested money. So you create a lot of value today with very little time that you were completely unable to create a hundred years ago.

So the lever gets much bigger and what humanity always has tended to do is we were always trying to reach further when we had better tools, it did not mean that we were getting lazy, but we were always trying to do more and do better. And I think that's what's gonna happen in the future as well.

The question though is, the lever that is getting longer and longer and thus has a bigger effect. It also has a bigger effect currently on the monetary distribution among the general population. And I think that at some point it might become a problem. So we might have to solve this at some point.

Bob: Sam Altman the CEO of OpenAI created another startup where they issued these Worldcoins, to basically create a crypto foundation for the universal base income. And I think that would basically back what you just said.

Bob: Cool. Thank you very much, Ingmar. It was really interesting.

Ingmar: Thank you, Bob.

Bob: It was really nice to have you here. Thanks for your time. Thank you very much and talk to you soon. Bye.