GPS Technology

GPS Technology

ATUM’s GPS technology platform combines Design of Experiment (DoE), accurate empirical measurement, and machine learning tools. At its core, GPS enables us to test small numbers of variants for commercially relevant properties, and to rapidly create molecules that work well in the real world.
ย 
We call the platform “GPS” because it works similarly to a satellite system โˆ’ it helps you to navigate the best path from your current position to your destination. The GPS platform is extremely versatile and can be applied to biological engineering at many levels including vector and gene optimization for expression in particular hosts, protein engineering for development of biocatalysts, and antibody humanization and affinity maturation.

The independent variables are the sequence characteristics on which the activity of a biological molecule depends. Examples include the codons used to encode each amino acid within a protein, the identities and configurations of regulatory elements within a vector, and amino acid changes within a protein.

Once we have identified the appropriate variables, we use DoE tools including but not limited to codon optimization and codon usage analysis, to build a set of variant molecules such that the independent variables are distributed to facilitate subsequent analysis. The number of variants is small (always fewer than 100 and sometimes as low as a dozen) because fewer variants to measure means more precise measurements of functionally relevant properties.

Why GPS is better than a library approach

GPS

Balanced sampling of sequence space using a mathematically optimal design of infologยฎ sequences.

ofat-tree

Library

Imbalanced and incomplete sampling of sequence space despite a much greater number of variants, because of the inherent biases of library construction.

Synthesizing individually designed infologs ensures that the physical implementation is identical to the virtual design with no random mutations.

GPS

Each genetic determinant is represented multiple times, each time in alternate context. This allows us to subsequently determine whether any substitutions do not play well with others.

Library

No control over which mutations occur together. There is thus no means by which to understand any interactions between mutations. Random mutations, deletions and nonsense mutations are sprinkled throughout, further confounding interpretation.

Assays should accurately measure the biological properties that are important for the final function of the molecule. By reducing the number of variants to fewer than 100, the GPS platform enables the application of multiple high-quality assays to measure all of the important properties properly.

GPS

12-96 infologs are built, thereby enabling small, precise measurements in conditions representative of real-world applications.

Library

Large libraries require the use of high-throughput surrogate screens. These screens typically measure only one property, often under conditions that are very different from the intended application.

GPS uses proprietary machine learning software to deconvolute the experimental data and determine the effect of each independent variable on the measured biological properties. The accuracy and reliability of the GPS platform are constantly improved as experimental data continually informs the design algorithms.

A total of 96 systematically designed transposase variants were used to build a model (see graph). The y-axis denotes the measured integration efficiency; the x-axis denotes the predicted integration efficiency. The diagonal distribution represents the accuracy of the model.

Quantifying Learning: Measured vs. Predicted

In many places on our website we show graphs where we plot predicted performance against the values that we actually measure. These graphs visualize the “Learn” portion of the Design-Build-Test-Learn cycle of ATUM’s GPSยฎ technology.

Predictions are made from artificial intelligence-derived “models” of the system under investigation. The better our predictions track the measured values, the better we have identified and understood the variables that affect performance, and the closer the points on the graph are to the diagonal.

These models serve as the basis of our design algorithms. When we know which genetic determinants make a system perform well, we select them while avoiding the genetic determinants that reduce performance. The design algorithm could be a codon optimization tool that uses the right codon biases to provide reliably high gene expression in a particular host; it could be one that selects the right amino acid substitutions for antibody humanization or improve the physico-chemical properties of an enzyme; or it could be an algorithm that chooses the best vector elements for a mammalian, yeast or bacterial protein expression system. Whichever it is, the reliability of that algorithm depends on the Measured vs Predicted graph for the model that underpins it.