3.3 Applications of ML algorithms for fermentation analysis and optimization
ANNs have been used successfully in several studies in the field of fermentation prediction and optimization (Table 3 ). The predictive capacity comparison of ANN and RSM has been studied by Nelofer et al. for the lipase production process by a recombinantEscherichia coli [99]. In this study, fermentation parameters were optimized based on experimental lipase production data. As a result, ANN showed better performance over RSM for both R2 and adjusted-R2 values. Moreover, absolute average deviation (AAD) and root mean square error (RMSE) in the ANN model gave lower values, indicating the high accuracy of ANN. Instead of comparing ANN and RSM, integration of these two strategies has also attracted much attention in recent years. For instance, in a recent study, Wang et al. proposed an ANN-RSM methodology to overcome the pure RSM failure in predicting complex nonlinear systems. They used original experimental datasets to train and validate an ANN model and produce response surface models to analyze the effect of critical parameters in dark hydrogen fermentation. The constructed model showed good and reliable results for this nonlinear and noisy process [100]. Genetic algorithm (GA) is a global search optimization method inspired by natural selection theory. GA usually has been coupled with ANN to find the optimum values of fermentation parameters used in model training. Recently, Unni et al. employed ANN together with GA to optimize medium composition for the production of human interferon-gamma (hIFN‐γ) using a recombinant Kluyveromyces lactis [101]. Recently, an on-line μ-stat strategy was proposed for controlling methanol feeding in a fed-batch process of RecombinantPichia pastoris . In this study, Tavasoli et al. employed MLP3 neural network (a class of ANNs) to reconstruct and adjust the controller’s performance. Consequently, a significant enhancement was observed in the production of human recombinant alpha 1-antitrypsin (A1AT) [102].
SVMs are another popular method for training experimental fermentation data and predicting the process outcome. One specific advantage that SVMs have over ANNs is that they always find the global optimum solution, while ANNs may fall into the local optimum. Moreover, SVMs are effective for problems with a small number of samples. The predictive capabilities of SVM and ANN were compared recently by Zhang et al. In this investigation, SVM and ANN were used to build models for predicting biomass yield, lipid production, and COD removal rate in a microbial lipid fermentation. The results demonstrated that the SVM linked with the genetic algorithm performed better over ANN with a small number of samples [103]. In another study, the least-square SVM (LS-SVM), a modified SVM, was coupled with orthogonal experimental design (OED) to map the relationship between process parameters as inputs and cumulative biogas production (CBP) as the output for corn stalks anaerobic fermentation with only nine samples. In this study, the LS-SVM parameters were optimized by the grid search method. The results showed that using this optimization method as an alternative to pure OED increases CBP by 14.13% [104].
Other ML methods also have shown reliable results in fermentation prediction and optimization. Kennedy et al. investigated the capabilities of fuzzy logic as a tool for media formulation. They found that this method can save 63% of the experiments and the remaining experiments are adequate for media design. They found that the selection of correct number of fuzzy logic rules is critical for enhancing model accuracy [105]. In another study, Melcher et al. utilized random forest and ANN for biomass and recombinant protein modeling in a fed-batch Escherichia coli process. Online fermentation parameters and two-dimensional (2D) fluorescence spectroscopy were used for dry cell mass and productivity prediction. The hybrid model accuracy reached about ±4% for dry cell mass and ±12% for protein concentration [106]. Masampally et al. employed Gaussian process regression (GPR) in fed-batch fermentation of yeast saccharomyces cerevisiae to predict biomass concentration. In this study, three cascade sub-models were developed to predict gas hold-up, dissolved oxygen (DO), and biomass storage, respectively. Validation experiments were eventually performed [107]. Recently, using the k-nearest-neighbor (KNN) method, a 1.64-fold improvement in Penicillium brevicompactumfermentation producing mycophenolic acid (MPA) has been obtained [108].