Skip to content

← All Q&A

How can a large language model design DNA for new cancer drugs?

LLMDNAEvo 2learning loop

Drawn from Lutz Finger's Forbes column, LinkedIn writing, and Cornell teaching. Sources are cited inline so you can read the originals.

DNA is a four-letter text, which makes sequence design a next-token problem, but only with the right data and a tight feedback loop.

What is DNA? It’s text. The alphabet of DNA has four letters and they keep permutating, so it’s a perfect problem for an LLM, if it’s trained on the right data. The general models are only okay-ish for DNA. There’s one trained specifically on DNA, Evo 2 by the Arc Institute, the best out there, but still 99% of what comes out is useless. So we built a proprietary Oracle to predict what is not useless, feed it to the wet lab, and the outliers go back into the model. Every spin of the wheel, it gets smarter.

Source: Tomorrow’s Medicine, with Cyriac Roeding (eCornell Keynote)


Have a follow-up? hello@lutzfinger.com. Or pick another question: all Q&A →