Source – https://www.gilmorehealth.com/
Artificial intelligence (AI) has made it possible for the first time to create fully artificial human genome sequences that are indistinguishable from the DNA of real donors. A European team just created entire sequences of human DNA, using this AI. Their work was published in the journal PLOS Genetics.
An algorithm that can generate artificial human genomes
“Generative neural networks have been used effectively in many different fields over the past decade, including photorealistic imaging,” say the authors of this new work. Applying a similar concept with genetic data, the researchers built their neural networks using the sequences of 2,500 people stored in databases. The system had to generate sequences with similar characteristics and then mix their creations with real ones to see if they could tell the difference. Through training, the artificial genomes created turned out to faithfully reproduce features of the real genomes, such as allele frequencies (the different versions of a gene). One of the biggest challenges of this work was to verify their reliability, said Aurélien Decelle, co-author of this work and a researcher at the University of Paris-Saclay. “So we spent some time studying the statistical properties of the generated sequences,” he explains.
Only sequences, not whole genomes
These “realistic” and “high-quality” genomes are a first, the researchers note in the paper. This type of neural network has already been used in genetics to generate short sequences “on the order of tens or hundreds of base pairs” (the building blocks of our DNA, of which there are about 3 billion in humans), explains Flora Jay, who co-led this work at the University of Paris-Saclay. “But the generation of such long sequences (about 10,000 variants comprising several million base pairs) and in the context of population genetics is new and represents a major step forward,” she adds.
As a result, these artificial genomes “are indistinguishable from the other genomes in the biobank that we used for our algorithm, except for one detail: they do not belong to any real donor,” Luca Pagani, co-author of the study, explains in a press release.
However, the process still needs to be perfected. “One of the main drawbacks is that these models cannot yet be used to create whole artificial genomes due to computational limitations,” and they must be limited to bits and pieces, the authors explain. In addition, very rare alleles are difficult to represent with the algorithm. The final challenge is to “closely monitor the originality of the generated data, i.e., whether they are sufficiently different from the genomes of real donors,” Flora Jay says, adding that this is an ongoing research topic.
Human genome study without concerns for privacy
Far from being without a purpose other than the scientific achievement itself, this type of artificial intelligence can solve the ethical problems associated with genetic databases. “In population genetics, researchers need to regularly compare the data they produce to some reference genomes or sometimes even to a large reference panel. Ideally, these genomes should reflect genetic diversity,” says Flora Jay. Artificial genomes could perform this function reliably and safely.
“Existing genomic databases are an invaluable resource for biomedical research, but they are not publicly available or are protected by lengthy and exhaustive application procedures due to legitimate ethical concerns,” explains author Burak Yelmen. “Artificial genomes can help us overcome this problem within a safe ethical framework.” Looking ahead, Flora Jay predicts that these artificial genomes “will contribute to applications as diverse as understanding our evolutionary past or medical epidemiology by incorporating greater genetic diversity”.