gps
technology

ATUM's GPS technology platform combines design of experiment, accurate empirical measurement and machine learning tools. At its core, GPS enables us to test small numbers of variants for commercially relevant properties, and to rapidly create molecules that work well in the real world.
We call the platform "GPS" because it works similarly to a satellite system − it helps you to navigate the best path from your current position to your destination. The GPS platform is extremely versatile and can be applied to biological engineering at many levels including vector and gene optimization for expression in particular hosts, protein engineering for development of biocatalysts, and antibody humanization and affinity maturation.

gps technology
for biological engineering

ATUM's GPS technology platform combines design of experiment, accurate empirical measurement and machine learning tools. At its core, GPS enables us to test small numbers of variants for commercially relevant properties, and to rapidly create molecules that work well in the real world.

We call the platform "GPS" because it works similarly to a satellite system − it helps you to navigate the best path from your current position to your destination. The GPS platform is extremely versatile and can be applied to biological engineering at many levels including vector and gene optimization for expression in particular hosts, protein engineering for development of biocatalysts, and antibody humanization and affinity maturation.

processAsset 7

01

Define Independent Variables
and Design Variants

The independent variables are the sequence characteristics on which the activity of a biological molecule depends. Examples include the codons used to encode each amino acid within a protein, the identities and configurations of regulatory elements within a vector, and amino acid changes within a protein.

Once we have identified the appropriate variables, we use design of experiment tools including but not limited to codon optimization and codon usage analysis, to build a set of variant molecules such that the independent variables are distributed in order to facilitate subsequent analysis. The number of variants is small (always fewer than 100 and sometimes as low as a dozen) because fewer variants to measure means more precise measurements of functionally relevant properties.

Why GPS is better
than a library approach

GPS

Balanced sampling of sequence space using a mathematically optimal design of infolog® sequences.

Library

Imbalanced and incomplete sampling of sequence space despite a much greater number of variants, because of the inherent biases of library construction.

02

Build

Synthesizing individually designed infologs ensures that the physical implementation is identical to the virtual design with no random mutations.

DNA2.0 Design & Synthesis

Protein Services

GPS

Each substitution is represented multiple times, each time in the context of different other substitutions. This allows us to subsequently determine whether any substitutions do not play well with others.

Library

No control over which mutations occur together. There is thus no means by which to understand any interactions between mutations. Random mutations, deletions and nonsense mutations are sprinkled throughout, further confounding interpretation.

03

Test - measure commercially relevant properties/activities

Assays should accurately measure the biological properties that are important for the final function of the molecule. By reducing the number of variants to fewer than 100, the GPS platform enables the application of multiple high quality assays to measure all of the important properties properly.

Analytics Assay Capabilities

GPS

12-96 infologs are built, thereby enabling small, precise measurements in conditions representative of real-world applications.

Library

Large libraries require the use of high throughput surrogate screens.These typically measure only one property, often under conditions that are very different from the intended application.

04

Learn - machine learning to inform design of new variants

GPS uses proprietary machine learning software to deconvolute the experimental data, and determine the effect of each independent variable on the measured biological properties. The accuracy and reliability of the GPS platform is constantly improved as experimental data continually informs the design algorithms.

A total of 96 systematically designed transposase variants were used to build a model. The y-axis denotes the measured integration efficiency; the x-axis denotes the predicted integration efficiency. The diagonal distribution represents the accuracy of the model.

Quantifying Learning: Measured vs. Predicted

In many places on our website we show graphs where we plot predicted performance against the values that we actually measure. These graphs are a way of visualizing the "Learn" portion of the Design-Build-Test-Learn cycle of ATUM's GPS® technology.

Predictions are made from artificial intelligence-derived "models" of the system under investigation. The better our predictions track the measured values, the better we have identified and understood the variables that affect performance, and the closer the points on the graph are to the diagonal.

These models then serve as the basis of our design algorithms. When we know which variables make a system perform well we select them, while avoiding the variables that reduce performance. The design algorithm could be a codon optimization tool that uses the right codon biases to provide reliably high gene expression in a particular host; it could be one that selects the right amino acid substitutions for antibody humanization or improve the physico-chemical properties of an enzyme; or it could be an algorithm that chooses the best vector elements for a mammalian, yeast or bacterial protein expression system. Whichever it is, the reliability of that algorithm depends on the Measured vs Predicted graph for the model that underpins it.

GPS

Different sequence changes frequently affect different properties, and often a change that is good for one desired activity is bad for another. By quantifying the effects of every sequence change on every property, it is possible to perform codon optimization to select combinations of changes that alter all properties in the desired direction.

Library

Library screening typically starts by measuring one property and selecting all library members whose activity exceeds a certain threshold. Those survivors are then screened for a second property, and so on. There is no learning, so no optimization of combinations is possible. So, the library needs to be big enough to contain all possible combinations at the outset. Which just makes the screening problem even worse...

05

Design New Variants

Beneficial sequence changes are retained and combined, deleterious sequence changes are eliminated, and if necessary, new substitutions may be added. The optimization cycle is repeated until the required properties are obtained.


Several orders of magnitude improvement (>300-fold) in transposase activity over 5 rounds of screening. A total of 96 variants were designed per round, each round incorporates best variants from the previous round and builds on these to make a new set through DoE algorithms. Integration frequency was measured for variants from each round and is shown along the y-axis.

processAsset 7

01

Define Independent Variables
and Design Variants

The independent variables are the sequence characteristics on which the activity of a biological molecule depends. Examples include the codons used to encode each amino acid within a protein, the identities and configurations of regulatory elements within a vector, and amino acid changes within a protein.

Once we have identified the appropriate variables, we use design of experiment tools including but not limited to codon optimization and codon usage analysis, to build a set of variant molecules such that the independent variables are distributed in order to facilitate subsequent analysis. The number of variants is small (always fewer than 100 and sometimes as low as a dozen) because fewer variants to measure means more precise measurements of functionally relevant properties.

Why GPS is better
than a library approach

GPS

Balanced sampling of sequence space using a mathematically optimal design of infolog® sequences.

Library

Imbalanced and incomplete sampling of sequence space despite a much greater number of variants, because of the inherent biases of library construction.

02

Build

Synthesizing individually designed infologs ensures that the physical implementation is identical to the virtual design with no random mutations.

DNA2.0 Design & Synthesis

Protein Services

GPS

Each substitution is represented multiple times, each time in the context of different other substitutions. This allows us to subsequently determine whether any substitutions do not play well with others.


Library

No control over which mutations occur together. There is thus no means by which to understand any interactions between mutations. Random mutations, deletions and nonsense mutations are sprinkled throughout, further confounding interpretation.

03

Test - measure commercially relevant properties/activities

Biological assays should accurately measure the biological properties that are important for the final function of the molecule. The core purpose of GPS technology is to avoid surrogate assays that simply measure activities that are easy to measure rather than those that really matter. We are NOT like the drunk looking for his car keys under the streetlight, not because that’s where he dropped them, but because that is where he can see. ATUM has extensive assay development expertise.

Examples of important properties include: yield of soluble/active protein, thermostability (enzyme or antibody), specific activity (enzyme), binding affinity (antibody), immunogenicity (vaccine antigen), in vivo therapeutic efficacy in animal models.

ATUM Codon bias Algorithms MAXIMIZE protein expression. Expression of polymerase variants (green circles), scFv antibody variants (pink circles) and DasherGFP (blue circles) are shown. Each point shows data from a different codon bias. Genes designed using ATUM’s advanced algorithms are shown as filled circles. Open circles show the two major algorithms used by our competitors: matching the E. coli genome bias or matching the bias found in highly expressed genes. PLoS ONE 2009 4(9):37002. Design parameters to control synthetic gene expression in Escherichia coli.

Hyperactive transposase(s) with >14-fold increased activity over wild type in S.cerevisiae. Activity for variants from Round 1 is shown as integration frequency along the y-axis. Integration frequency was measured as a ratio of ura+ trp+ to unselected clones in a yeast screen as described below. A total of 96 systematically designed transposase variants (x-axis) containing 60 beneficial mutations most enriched by selection were tested, substitution distribution in the designed set of variants was determined through DoE algorithms. A library was generated to identify mutations that were beneficial to activity by site saturation mutagenesis using a yeast screen for hyperactive transposase variants. A construct with an insertion of a tryptophan transcriptional unit flanked by transposon ends within the URA3 marker gene was made that makes URA3 non-functional. In the presence of transposase, excision of the TRP restores URA3 function that allows cells transformed to grow in the absence of uracil but not without tryptophan (ura+, trp-); excision and reintegration of the transposon at a different site in the genome allows cells to grown in the absence of uracil and tryptophan ( ura+ and trp+). Ratio of the number of clones that are ura+ trp+ to unselected gives the integration eciency

GPS

12-96 infologs are built, thereby enabling small, precise measurements in conditions representative of real-world applications.

Library

Large libraries require the use of high throughput surrogate screens.These typically measure only one property, often under conditions that are very different from the intended application.

04

Learn - machine learning to inform design of new variants

ATUM uses proprietary artificial intelligence algorithms to determine the effect of each independent variable on the measured biological properties.

A total of 96 systematically designed transposase variants were used to build a model. The y-axis denotes the measured integration efficiency; the x-axis denotes the predicted integration efficiency. The diagonal distribution represents the accuracy of the model.

Quantifying Learning: Measured vs. Predicted

In many places on our website we show graphs where we plot predicted performance against the values that we actually measure. These graphs are a way of visualizing the "Learn" portion of the Design-Build-Test-Learn cycle of ATUM's GPS® technology.

Predictions are made from artificial intelligence-derived "models" of the system under investigation. The better our predictions track the measured values, the better we have identified and understood the variables that affect performance, and the closer the points on the graph are to the diagonal.

These models then serve as the basis of our design algorithms. When we know which variables make a system perform well we select them, while avoiding the variables that reduce performance. The design algorithm could be a codon optimization tool that uses the right codon biases to provide reliably high gene expression in a particular host; it could be one that selects the right amino acid substitutions for antibody humanization or improve the physico-chemical properties of an enzyme; or it could be an algorithm that chooses the best vector elements for a mammalian, yeast or bacterial protein expression system. Whichever it is, the reliability of that algorithm depends on the Measured vs Predicted graph for the model that underpins it.

GPS

Different sequence changes frequently affect different properties, and often a change that is good for one desired activity is bad for another. By quantifying the effects of every sequence change on every property, it is possible to perform codon optimization to select combinations of changes that alter all properties in the desired direction.

Library

Library screening typically starts by measuring one property and selecting all library members whose activity exceeds a certain threshold. Those survivors are then screened for a second property, and so on. There is no learning, so no optimization of combinations is possible. So, the library needs to be big enough to contain all possible combinations at the outset. Which just makes the screening problem even worse...

05

Design New Variants

Beneficial variables are retained and deleterious variables are eliminated. ATUM combines retained variables in new combinations, and adds new variables until the required properties are obtained.

Several orders of magnitude improvement (>300-fold) in transposase activity over 5 rounds of screening. A total of 96 variants were designed per round, each round incorporates best variants from the previous round and builds on these to make a new set through DoE algorithms. Integration frequency was measured for variants from each round and is shown along the y-axis.

applying the
gps platform

The GPS platform is a key differentiator across
ATUM's products and services

Bioengineering with:

GeneGPS

Search Space:
10100
Variants per Round:
≤ 48
Search Space:
10100
Variants per Round:
≤ 48
Output: Reliably gene optimization for high expression in specific hosts

Literature

Design Parameters to Control Synthetic Gene Expression in E. Coli

Experimental determination of design parameters that formed the foundation of GeneGPS.

Engineering Genes for Predictable Protein Expression

Review of the GeneGPS gene optimization strategy and vector design.

VectorGPS

Search Space:
104 105
Variants per Round:
≤ 24
Search Space:
104 105
Variants per Round:
≤ 24
Output: Optimal vector design for expression host and application

Literature

In Silico Design of Functional DNA Constructs

ATUM's Gene Designer software enables design of novel vector combinations from genetic elements

ProteinGPS

Search Space:
106 1020
Variants per Round:
≤ 96
Search Space:
106 1020
Variants per Round:
≤ 96
Output: Protein optimized across multiple attributes affecting functionality and developability

Literature

Mapping of amino acid substitutions conferring herbicide resistance in wheat glutathione transferase

Application of ProteinGPS DOE and machine learning to characterize and understand substrate specificity of glutathione transferase in wheat.

Redesigning and characterizing the substrate specificity and activity of Vibrio fluvialis aminotransferase for the synthesis of imagabalin

Collaboration with Pfizer using ProteinGPS to develop an aminotransferase with a 60-fold increase in activity in Vibrio fluvialis.

"Site and Mutation" − Specific Predictions Enable Minimal Directed Evolution Libraries

Collaboration with Merck illustrating application of ProteinGPS to improve activity of a transaminase in E. Coli.

AntibodyGPS

Search Space:
106 1020
Variants per Round:
≤ 96
Search Space:
106 1020
Variants per Round:
≤ 96
Output: Antibody optimized across multiple attributes affecting functionality and developability

HumanizationGPS

Search Space:
106 1020
Variants per Round:
≤ 96
Search Space:
106 1020
Variants per Round:
≤ 96
Output: Humanized antibody that retains functionality and developability attributes

Have a question?
Let's talk.

ATUM customer support scientists are available to discuss cloning strategies, gene design constraints, bioinformatics analyses, and other molecular biology/biotechnology concerns.

Call

Corporate Headquarters
(Newark, California)
+1 650 853 8347

Email

We generally reply within a few hours.

info@atum.bio