Last year, Carlos Gomez-Uribe joined Solugen as our VP of Artificial Intelligence and Machine Learning. With experience at Google, Netflix, Facebook, and Apple, Carlos is applying the same computational thinking that created personalized TV recommendations to protein design. We’re impatient, and Carlos is too, so together, we’re leveraging machine learning to drive down costs and speed up progress. It’s a race to mission, and we’re applying AI to the latest and greatest of biology—already reducing process costs and optimizing production.
Carlos shares his thoughts on how AI and machine learning transfers from internet tech to biomanufacturing.
Before Solugen, you spent over a decade working at every FAANG company except Amazon. What prompted your pivot from internet tech into biotech?
A lot of soul searching. I’ve always been mission driven, which is why I left my favorite job—which was at Netflix—to move to Facebook. It was after the election and I felt a responsibility to improve the quality of information on a platform that felt so easily manipulated for the worse. It was a worthy area, but it was hard to make a difference in such a bureaucratic company. I then went to Apple, because I wanted to work on another cause I cared about: health. But again, progress was too slow. Large corporations are just too sclerotic. Start-ups are where it’s at. The third big mission I care about is climate. I explored a lot of climate companies, but Solugen was the only one with a strategy I thought could really benefit from computation.
What compelled you about Solugen’s strategy? How did you identify your first projects and how do you see that growing over time?
My PhD was in systems biology at MIT, so it made total sense to me to design proteins to enable low temperature, highly specific reactions. After talking to people across the company, I identified two significant areas where machine learning can make a difference today. The first is protein design, and the second is the optimization of our chemical processes. There are areas that may eventually become important down the line, such as using AI to better control the plant so that the conversion and production is maximized. I can also imagine building recommender systems to better suggest meaningful products to clients. But protein design and the optimization of chemical processes are by far the most important because they are about innovation. If we want to accomplish our mission of decarbonizing the chemicals industry, we need to figure out how to make a lot more molecules than we know how to make now, and we need to be able to very quickly scale them up in factories.
What does an AI and machine learning intervention into protein design and the optimization of chemical processes look like in practice?
I’ll start with protein design. We have a team of hardcore biologists who design the enzymes we use. Those designs have historically relied on what’s called directed evolution, where you start with an enzyme that does a crappy job at a reaction and then randomly mutate it and screen the mutants to find which ones improved the reaction in ways that you care about. That becomes your next best protein. And then you repeat, with each cycle being called a campaign. It takes two and a half to three months per campaign, and seven campaigns total, to get a good result. So a year and a half to make a better enzyme. With machine learning, we can use the computer to design which mutants we want to screen, reducing the number of campaigns from seven to one, and the total time from over a year to three months. It’s very fun, and uses the latest and greatest of biology.
With the optimization of chemical processes, we’re using Bayesian optimization to learn a model of the output of the chemical processes—for example conversion of the product we want to make—as a function of all the parameters, and then we use that model to select the next set of experiments to run, always updating the model with new data to suggest experiments that are likely to maximize the conversion. When you use this type of technique, you end up reducing the number of experiments and the costs dramatically. We applied Bayesian optimization to one of our current products and the process cost went down by 60 percent in ~40 small scale reactors. We’ve since scaled that best experimental condition we found to the much larger pilot plant, and the results held up. So the final step is taking the optimized process to the Bioforge to see the full impact of our ML-driven optimization for this key reaction.
Zooming out, how are AI and machine learning accelerating the creation of new molecules?
By enabling new chemical reactions. We literally enable new reactions that humanity hasn’t done before. Or nature, as far as we know.
You spent seven years at Netflix. What were you working on there, and how are you applying those learnings to your current role at Solugen?
I joined Netflix as a statistician, and then spent the last four and a half years as Vice President of Product Innovation in charge of the recommendation algorithm. Our objective was to maximize retention and minimize cancellation rates, which was highly correlated with getting people to watch more hours of Netflix per month. To do that, we had to be very good at using the scientific method—running experiments at large scale, analyzing them, and then based on the results deciding on what became the default experience for users. When we think about protein design at Solugen, it’s an application of the same scientific method [from Netflix]. To go from hypothesis to a large-scale experimental design, using the results to optimize the machine learning model, and then running the next set of experiments. Each experiment is basically screening a library of mutant sequences for the reaction that we care about.
I loved Netflix because I was able to have an impact on a product that millions of people used. And the company moved fast. The thing was, movies and TV shows were totally random. Solugen also moves fast, but it’s a mission I care about.
How do you see AI fitting into climate tech and sustainability beyond Solugen?
It’s early times. People in machine learning and computer science have been spoiled by being able to write software and then have that software be the product. Which means we could shape that product super easily by just sending bits around. Solving real problems like emissions from chemical manufacturing—that involves changing the physical world. It’s not bits, it’s atoms. And designing and moving atoms is slower and harder, which means you have to figure out how to translate the bits into atoms. Thankfully, advances in machine learning are finally enabling useful outputs that can be translated into the physical world.
For enzymes, there are public data sets with hundreds of millions of sequences for the proteins. That has enabled an explosion of machine learning models that understand protein sequences, but there are very few and very limited data sets with sequences and *function* for the protein. Because of that, it’s still very hard to find good models to predict function from sequence. For us to maximize impact, I think we need a consortium with universities and governments to produce the public data sets that the field needs to develop the best models to predict functions of proteins from sequence.