Research Phase 2

Machine Learning for Analyzing Protein Synthesis

Research Results /

Proteins are biological macromolecules and a central building block of life on our planet. For use in therapeutics, diagnostics or the food industry, proteins are synthesized in living cells. Due to numerous and partly unknown influencing factors and interactions, these biological processes are highly complex and difficult to predict.

Machine learning algorithms enable the analysis of high-dimensional process data and the creation of predictive quality models. Therefore, they expose new possibilities for process optimization. This requires close cooperation between data engineers and domain experts, and a mutual understanding of each other's work, routines, and language. In this project, machine learning algorithms were developed to analyze and model protein synthesis, with a focus on model transparency and comprehensive evaluation.

Several methods for quality assessment of process models were tested. The developed methods were integrated into a toolbox and a machine learning manual was designed to improve interdisciplinary communication between data engineers and process experts. Insights were gained about data correlations, process influencing factors, and that some proteins can be modeled much more accurately than others. A high variance across multiple predictive models implies that the available data do not capture external factors that significantly affect protein production.