Competition: Genetic Engineering Attribution Challenge

Navigation

Working with lab samples

We were blown away by the results of this competition, and are really excited to see how the attribution field continues to develop from here.

— Will Bradshaw, Competition Director at altLabs

Why

Synthetic biology offers fantastic benefits for society, but its anonymity opens the door for reckless or malicious actors to cause serious harm. Currently, there is no easy way of tracing genetically engineered DNA back to its lab-of-origin. This task is known as attribution, and it's a pivotal part of ensuring that genetic engineering progresses responsibly.

The Solution

When manipulating DNA, a designer has many decisions she must make, such as promoter choice and cloning method. These choices leave clues in the genetic material, and together, compose a "genetic fingerprint" that can be traced back to the designer.

The goal of the Genetic Engineering Attribution Challenge was to develop tools that help human decision makers identify the lab-of-origin from genetically engineered DNA. This competition was composed of two tracks. In the Prediction Track, participants competed to attribute DNA samples to its lab-of-origin with the highest possible accuracy. In the Innovation Track, competitors that beat the BLAST benchmark were invited to submit reports demonstrating how their lab-of-origin prediction models excel in domains beyond raw accuracy.

The Results

When attributing DNA to its source, simply narrowing the field of possible labs would be a boon to human decision makers. This is why top-10 accuracy was chosen as the competition metric. The top model was able to achieve over 95% top-10 accuracy with 1,314 possible labs present in the dataset, compared with the pre-existing BLAST baseline of 76.9%. Moreover, three of the top four submissions were able to to achieve more than 80% top-1 accuracy.

Solutions were also tested against an out-of-sample verification set from data that was collected after the competition concluded. The verification data included both "previously seen" labs included in the competition dataset and "unseen" labs (predictions for "unseen" labs are considered correct if labeled "unknown"). On the previously seen dataset, top models were able to score an impressive top-10 accuracy of 78 - 85%, substantially exceeding the baseline performance of 59.5%.

To further promote the development and use of attribution technology in the future, all code from winning submissions will be made open-source, and altLabs has reached out to winning teams to discuss their plans for publication of their results.

alt-text