Automated replication optimization for protocellular information system

Due to the high cost of experiments, high dimensional complex systems with multiple parameters usually pose grand challenges not only in artificial life but in areas including manufacturing processes, supply chains as well as services in the healthcare sector. We present and verify a fully automated method of reducing the needed experiments to identify optimal operational conditions for complex systems, here tested in simulation on a protocellular information system. The method iteratively becomes better at locating system optima through an adaptive data analysis, which is an advantage over e.g. a Monte Carlo optimization method (2).


Introduction
Due to the high cost of experiments, high dimensional complex systems with multiple parameters usually pose grand challenges not only in artificial life but in areas including manufacturing processes, supply chains as well as services in the healthcare sector. We present and verify a fully automated method of reducing the needed experiments to identify optimal operational conditions for complex systems, here tested in simulation on a protocellular information system. The method iteratively becomes better at locating system optima through an adaptive data analysis, which is an advantage over e.g. a Monte Carlo optimization method (2).

Autonomous Exploration and Optimization
We have developed and tested an autonomous system for exploration and optimization of complex systems: an Autonomous Exploration and Optimization Loop (AEOL). It is composed of three principal parts that are connected in a loop: (i) a simulation of the complex system under investigation, (ii) an Artificial Intelligence based Design of Experiment (AI-DoE) algorithm and (iii) a message handling system that sends output from the simulation to the AI-DoE systems that in turn sends new input parameters to the simulation system. The loop can be iterated a predefined number of times or until a desired result is obtained.
Loop component (i) is a Lesion Induced DNA Amplification (LIDA) process (see (1) and (3)), which could function as an informational building block with sufficient replication yield for the protocellular model developed in (6). It is a reaction kinetic equation system of the form dx dt = f (x, α), where x and α denote the involved physicochemical species concentrations and reaction constants respectively. A DNA template and four shorter complementary DNA oligomers are replicated through template directed ligation. Different sequences and corresponding oligomers yield different hybridization energies and thus different reaction rates. Results from LIDA simulations and lab experiments are shown in Fig. 1.
Loop component (ii) is the AI-DoE prediction algorithm that is based on a neural network that leverages experimen-  Bottom Regression tree with main response explanatory rules discovered by the AI-DoE algorithm. Interestingly, the algorithm firstly discovers how to prevent product inhibition by lowering the hybridization energies for the full strands, which means higher dissociation rate k − m T . Secondly, the algorithm increases the ligation kinetics k L . When re-doing the optimization campaign for k L < 10 −2 , the algorithm reverses the relevant importance of k − m T and k L , so k L becomes the main rate limiting factor.
give improved results. With each loop iteration, the model is increasingly refined with new accumulated data and its predictive performance improves discovering increasingly better experiments.
Loop component (iii) is the message handling software (AEOP), which is written in Python and currently capable of executing both Python and MATLAB based complex system experiments. AEOP transfers (x i , α i ) to AI-DoE and α i+1 back to dx dt = f (x, α). The communication between AEOL and AI-DoE is through an Application Interface (API) provided by Daptics (4).

Discussion
Using the AEOL software with the AI-DoE algorithm we achieve fully autonomous optimization of the involved LIDA reaction kinetics. AEOL idenitifies mathematically optimal kinetic parameter combinations for the desired high replication yield, so it may also find parameter combinations that are not physically realizable. Thus a post analysis review of results may be necessary or additional restrictions (expressing physical constraints) need to be imposed in the adaptive search process.
For the optimization process discussed in Fig. 2 the AI-DoE algorithm already identifies an optimal solution in the second generation using a population of 10 in each generation. This means that we could either have asked for fewer generations or used smaller populations in each generation. However, re-doing the optimization with different upper and lower parameter bounds as expected yield different results. If we change the lower bound for the ligation constant to k L ≤ 10 −2 /(mol sec), which is more realistic, and again use a generation population of 10, the algorithm locates optimal solutions after six generations, which is then refined in generation seven. Further, the top parameter in the regression tree with main response explanatory rules (recall Fig. 3, Bottom) becomes k L , which means that the algorithm views the ligation constant as the main rate limiting reaction.
It should be noted that the same method we use here could be used for automated in vitro experiments instead of simulations. A manual loop utilizing the AI-DoE algorithm is obviously also applicable for exploration and optimization of other complex systems.