AI: computers learn biology and make discoveries – 03/30/2024 – Science

AI: computers learn biology and make discoveries – 03/30/2024 – Science

[ad_1]

In 1889, French doctor François-Gilbert Viault climbed down from a mountain in the Andes, took blood from his own arm, and examined it under a microscope. The doctor’s red blood cells, responsible for carrying oxygen around the body, had increased by 42%. He had discovered a mysterious power inherent in the human body: when it needs more cells crucial to life, it can produce them on demand.

In the early 1900s, scientists came to the conclusion that the cause of this expansion of red blood cells was a hormone produced mainly by the kidneys. They called it erythropoietin, which in Greek means “red maker.”

Seven decades later, researchers found real erythropoietin after filtering 2,536 liters of urine. About 50 years later, biologists in Israel found a rare kidney cell that produces the hormone when oxygen levels drop too low. It was called the Norn cell, after the Norse deities who were believed to control the destiny of humanity.

It took humans 134 years to discover the Norn cell. Recently, computers in California (United States) discovered it in just six weeks.

The discovery came when scientists at Stanford University programmed computers to learn biology on their own. The machines ran an AI (artificial intelligence) program similar to ChatGPT.

Stanford researchers trained computers on raw data about millions of real cells and their chemical and genetic makeup. However, they did not “teach” the machines the meaning of these measurements.

Computers processed the data, creating a model of all the cells based on their similarity. When they finished, they had learned a lot. They were able to classify a cell as belonging to one of the more than a thousand existing types. One of them was Norn.

Stanford’s software is one of a number of new AI-based programs known as “foundation models” that are aimed at learning fundamentals of biology. But models go beyond simply organizing the information that biologists have collected; they are discovering how genes work and how cells develop.

As models evolve, accumulating more laboratory data and greater computational capacity, scientists predict that deeper discoveries will be made, such as secrets about cancer and other diseases.

Heart cells

Biologists have long sought to understand how cells use genes to do so many of the things we need to stay alive.

A decade or so ago, scientists began industrial-scale experiments to extract genetic pieces from individual cells. They recorded what they found in catalogs, called “cell atlases”, with billions of data points.

Christina Theodoris, a medical resident at Boston Children’s Hospital, was reading about a new type of AI model created by Google engineers in 2017 for language translation. The developers provided the model with millions of English sentences, as well as their translation into German and French. The model developed the ability to translate phrases never seen before.

Theodoris wondered whether a similar model could teach itself to make sense of data extracted from a cell atlas. In 2021, she struggled to find a laboratory that would allow her to build a model.

Shirley Liu, a computational biologist at the Dana-Farber Cancer Institute in Boston, gave it a try. Theodoris then extracted data from 106 published human studies that stored information about 30 million cells, and fed it all into a program called GeneFormer.

The model gained deep understanding of how our genes behave in different cells. He predicted that deactivating a gene called TEAD4, in certain heart cells, would seriously unbalance the heart. When his team carried out tests on real cells, called cardiomyocytes, the rhythm of the heart cells became weaker.

The Stanford team got into the foundation model business after collaborating on building one of the largest cell databases, known as CellXGene. Since last August, researchers have trained their computers to understand the 33 million cells in the database, focusing on a type of genetic information called messenger RNA. They also fed the model with 3D structures of proteins, which are generated by genes.

From this data, the model — known as Universal Cell Embedding, or UCE — calculated the similarity between cells, grouping them into more than a thousand sets according to the way they used their genes. The clusters corresponded to cell types discovered by generations of biologists.

The model was also able to transfer its knowledge to new species. If presented with the genetic profile of cells from an animal it had never seen before — a naked mole rat, for example — ECU could identify many of its cell types.

After UCE discovered the Norn cells, Jure Leskovec, a computer scientist at Stanford who trained the computers, and his colleagues went back to the CellXGene database to find out where they came from. Although many were taken from the kidneys, some came from the lungs or other organs. It was possible, the researchers speculated, that previously unknown Norn cells were spread throughout the body.

In other words, the ECU may have discovered a new type of cell.

Just like ChatGPT, biological models make mistakes. Kasia Kedzierska, a computational biologist at the University of Oxford, asked GeneFormer and another foundation model, scGPT, for a battery of tests. They presented the models with material from the cell atlas that had not been researched before and asked them to complete tasks. The models performed poorly on some tasks compared to simpler computer programs.

According to Leskovec, the models are improving, but compared to the training ChatGPT receives from across the internet, the atlases offer a modest amount of information.

Stephen Quake, a Stanford biophysicist who helped develop the ECU, suspects that the foundational models will learn not only about the types of cells that currently reside in our bodies, but also about the types of cells that could exist. He dreams of being able to use foundation models to make a map that shows the realm of the possible, beyond which life cannot exist.

Having a map of what is possible and impossible to sustain life could also mean that scientists will actually be able to create cells that don’t already exist in nature.

[ad_2]

Source link