Artificial intelligence and machine learning are having a transformational impact on industries across the globe. As these technologies mature, companies are discovering novel ways to unlock opportunities that were once unimaginable.

In chemical manufacturing, where innovation has often been constrained by slow and resource-heavy methods, the potential to leverage these tools is rapidly expanding. At Solugen, we’re harnessing their power to redefine what’s possible in protein engineering.

We’ve developed two powerful tools that are helping us tackle the complexity of protein design with unprecedented speed and precision:

1) Seq2Fitness, a machine learning model designed to predict the fitness of protein sequences, and

2) Biphasic Annealing for Diverse Adaptive Sequence Sampling (BADASS), a powerful optimization algorithm for more efficient and intelligent protein sequence sampling.

These innovations allow us to accelerate the development of high-performance enzymes that drive more efficient chemical reactions, paving the way for a more sustainable future.

Why Smarter Protein Design Matters 
Proteins are at the heart of countless industrial and natural processes, from breaking down food during digestion to catalyzing key chemical reactions in manufacturing. Designing proteins that can perform better, or under more extreme conditions, is essential for advancing innovative processes. However, traditional methods of protein design, like directed evolution, are slow and resource-intensive. The vastness of protein sequences makes it difficult to explore all potential designs efficiently. 

By integrating machine learning with protein engineering, we’re able to overcome these limitations. Seq2Fitness and BADASS aren’t just tools, they represent a fundamental shift in how we explore the protein sequence space. With these technologies, we’re able to predict, test, and refine new protein designs faster, significantly reducing the time it takes to move from concept to solution.

Seq2Fitness: Learning from Data to Predict Protein Performance 
The vastness of protein sequence possibilities is both an opportunity and a challenge. Seq2Fitness helps us navigate this landscape by using deep learning to predict how well a given protein sequence will perform a specific function. By combining data from protein language models and laboratory tests, Seq2Fitness improves upon previous models, allowing us to predict performance in regions of the sequence space that were previously underexplored in the lab. 

This kind of predictive capability is critical for industrial applications, where enzymes often need to function in harsh conditions. By rapidly identifying promising protein candidates, Seq2Fitness allows us to accelerate our design process and focus our resources where they matter most—on the proteins that have the greatest potential to optimize chemical reactions. This drastically reduces the trial-and-error process, saving time and resources in our search for the next breakthrough enzyme. 

BADASS: Smart Exploration with Dynamic Sampling 
With Seq2Fitness accurately predicting the function of protein sequences, the next challenge is to explore the full sequence space effectively to fine-tune and optimize those designs. Traditional optimization methods, like simulated annealing, often get stuck in local optima—essentially settling for “good enough” rather than pushing the boundaries of what’s possible.  

That’s where BADASS comes in. By alternating between cooling and heating phases and updating mutation scores based on previously evaluated protein sequences, this biphasic annealing algorithm explores the protein sequence space more dynamically, allowing us to maintain diversity while optimizing performance.  

In benchmark tests, BADASS consistently outperforms traditional optimization methods, enabling us to push deeper into unexplored regions of the sequence landscape by uncovering more high-fitness sequences with less computational effort.   

A Bold Vision for the Future of Chemicals
At Solugen, our mission is to create sustainable chemical solutions that reduce waste, lower energy consumption, and minimize the need for harmful chemicals. The integration of Seq2Fitness and BADASS into our workflows is helping us achieve this vision by enabling the design of enzymes that catalyze reactions more efficiently and in environmentally friendly conditions. 

This approach is a game changer—not just for Solugen, but for the broader chemical industry. As we continue to unlock new possibilities in protein engineering, the ripple effects across various sectors will be profound. For Solugen and beyond, AI-driven enzyme design holds the key to a greener, more efficient future in chemical manufacturing. To contribute to this progress, we are publishing our Seq2Fitness and BADASS tools so the broader protein engineering field can move toward a more sustainable future together. 

Want to learn more? Check out our recently published white paper or listen to an audio summary to dive deeper into how these methodologies are reshaping the industry. Our Seq2Fitness and BADASS codes are available for non-commercial use.

This white paper was authored by Carlos A. Gomez-Uribe, Japheth Gado, and Meiirbek Islamov of Solugen’s AI and Machine Learning team. We thank them for their invaluable contributions to both Solugen and the broader field of protein engineering.