CRISPR has revolutionized science. AI is now taking the gene editor to the next level.
Thanks to its ability to accurately edit the genome, CRISPR tools are now widely used in biotechnology and across medicine to tackle inherited diseases. In late 2023, a therapy using the Nobel Prize-winning tool gained approval from the FDA to treat sickle cell disease. CRISPR has also enabled CAR T cell therapy to battle cancers and been used to lower dangerously high cholesterol levels in clinical trials.
Outside medicine, CRISPR tools are changing the agricultural landscape, with projects ongoing to engineer hornless bulls, nutrient-rich tomatoes, and livestock and fish with more muscle mass.
Despite its real-world impact, CRISPR isn’t perfect. The tool snips both strands of DNA, which can cause dangerous mutations. It also can inadvertently nip unintended areas of the genome and trigger unpredictable side effects.
CRISPR was first discovered in bacteria as a defense mechanism, suggesting that nature hides a bounty of CRISPR components. For the past decade, scientists have screened different natural environments—for example, pond scum—to find other versions of the tool that could potentially increase its efficacy and precision. While successful, this strategy depends on what nature has to offer. Some benefits, such as a smaller size or greater longevity in the body, often come with trade-offs like lower activity or precision.
Rather than relying on evolution, can we fast-track better CRISPR tools with AI?
This week, Profluent, a startup based in California, outlined a strategy that uses AI to dream up a new universe of CRISPR gene editors. Based on large language models—the technology behind the popular ChatGPT—the AI designed several new gene-editing components.
In human cells, the components meshed to reliably edit targeted genes. The efficiency matched classic CRISPR, but with far more precision. The most promising editor, dubbed OpenCRISPR-1, could also precisely swap out single DNA letters—a technology called base editing—with an accuracy that rivals current tools.
“We demonstrate the world’s first successful editing of the human genome using a gene editing system where every component is fully designed by AI,” wrote the authors in a blog post.
Match Made in Heaven
CRISPR and AI have had a long romance.
The CRISPR recipe has two main parts: A “scissor” Cas protein that cuts or nicks the genome and a “bloodhound” RNA guide that tethers the scissor protein to the target gene.
By varying these components, the system becomes a toolbox, with each setup tailored to perform a specific type of gene editing. Some Cas proteins cut both strands of DNA; others give just one strand a quick snip. Alternative versions can also cut RNA, a type of genetic material found in viruses, and can be used as diagnostic tools or antiviral treatments.
Different versions of Cas proteins are often found by searching natural environments or through a process called direct evolution. Here, scientist rationally swap out some parts of the Cas protein to potentially boost efficacy.
It’s a highly time-consuming process. Which is where AI comes in.
Machine learning has already helped predict off-target effects in CRISPR tools. It’s also homed in on smaller Cas proteins to make downsized editors easier to deliver into cells.
Profluent used AI in a novel way: Rather than boosting current systems, they designed CRISPR components from scratch using large language models.
The basis of ChatGPT and DALL-E, these models launched AI into the mainstream. They learn from massive amounts of text, images, music, and other data to distill patterns and concepts. It’s how the algorithms generate images from a single text prompt—say, “unicorn with sunglasses dancing over a rainbow”—or mimic the music style of a given artist.
The same technology has also transformed the protein design world. Like words in a book, proteins are strung from individual molecular “letters” into chains, which then fold in specific ways to make the proteins work. By feeding protein sequences into AI, scientists have already fashioned antibodies and other functional proteins unknown to nature.
“Large generative protein language models capture the underlying blueprint of what makes a natural protein functional,” wrote the team in the blog post. “They promise a shortcut to bypass the random process of evolution and move us towards intentionally designing proteins for a specific purpose.”
Do AIs Dream of CRISPR Sheep?
All large language models need training data. The same is true for an algorithm that generates gene editors. Unlike text, images, or videos that can be easily scraped online, a CRISPR database is harder to find.
The team first screened over 26 terabytes of data about current CRISPR systems and built a CRISPR-Cas atlas—the most extensive to date, according to the researchers.
The search revealed millions of CRISPR-Cas components. The team then trained their ProGen2 language model—which was fine-tuned for protein discovery—using the CRISPR atlas.
The AI eventually generated four million protein sequences with potential Cas activity. After filtering out obvious deadbeats with another computer program, the team zeroed in on a new universe of Cas “protein scissors.”
The algorithm didn’t just dream up proteins like Cas9. Cas proteins come in families, each with its own quirks in gene-editing ability. The AI also designed proteins resembling Cas13, which targets RNA, and Cas12a, which is more compact than Cas9.
Overall, the results expanded the universe of potential Cas proteins nearly five-fold. But do any of them work?
Hello, CRISPR World
For the next test, the team focused on Cas9, because it’s already widely used in biomedical and other fields. They trained the AI on roughly 240,000 different Cas9 protein structures from multiple types of animals, with the goal of generating similar proteins to replace natural ones—but with higher efficacy or precision.
The initial results were surprising: The generated sequences, roughly a million of them, were totally different than natural Cas9 proteins. But using DeepMind’s AlphaFold2, a protein structure prediction AI, the team found the generated protein sequences could adopt similar shapes.
Cas proteins can’t function without a bloodhound RNA guide. With the CRISPR-Cas atlas, the team also trained AI to generate an RNA guide when given a protein sequence.
The result is a CRISPR gene editor with both components—Cas protein and RNA guide— designed by AI. Dubbed OpenCRISPR-1, its gene editing activity was similar to classic CRISPR-Cas9 systems when tested in cultured human kidney cells. Surprisingly, the AI-generated version slashed off-target editing by roughly 95 percent.
With a few tweaks, OpenCRISPR-1 could also perform base editing, which can change single DNA letters. Compared to classic CRISPR, base editing is likely more precise as it limits damage to the genome. In human kidney cells, OpenCRISPR-1 reliably converted one DNA letter to another in three sites across the genome, with an editing rate similar to current base editors.
To be clear, the AI-generated CRISPR tools have only been tested in cells in a dish. For treatments to reach the clinic, they’d need to undergo careful testing for safety and efficacy in living creatures, which can take a long time.
Profluent is openly sharing OpenCRISPR-1 with researchers and commercial groups, but they are keeping the AI that created it in-house. “We release OpenCRISPR-1 publicly to facilitate broad, ethical usage across research and commercial applications,” they wrote.
As a preprint, the paper describing their work has yet to be analyzed by expert peer reviewers. Scientists will also have to show OpenCRISPR-1 or variants work in multiple organisms, including plants, mice, and humans. But tantalizingly, the results open a new avenue for generative AI—one that could fundamentally change our genetic blueprint.
Image Credit: Profluent
* This article was originally published at Singularity Hub
0 Comments