Deep Learning Algorithm Could Enhance Genomic Sequencing
A deep learning tool could improve genomic sequencing processes, identifying disease-causing mechanisms that might otherwise be missed by traditional screening methods, according to a study published in Nature Machine Intelligence.
Researchers from Children’s Hospital of Philadelphia (CHOP) and New Jersey Institute of Technology (NJIT) developed the tool, which can help predict sites of DNA methylation – a process that can change the activity of DNA without changing its overall structure.
DNA methylation is involved in many key cellular processes and is an important component in gene expression. Errors in methylation can be linked to a wide range of human diseases. Genomic sequencing tools can effectively pinpoint polymorphisms that may cause a disease, but these same methods are unable to capture the effects of methylation because the individual genes still look the same.
Researchers have made a considerable effort to study DNA methylation of N6-adenine (6mA) in eukaryotic cells, which include human cells. Although there is genomic data available, the role of methylation in these cells remains elusive.
“Previously, methods that had been developed to identify these methylation sites in the genome were very conservative and could only look at certain nucleotide lengths at a given time, so a large number of methylation sites were missed,” said Hakon Hakonarson, PhD, Director of the Center for Applied Genomics (CAG) at CHOP and one of the senior co-authors of the study.
“We needed to develop a better way of identifying and predicting methylation sites with a tool that could identify these motifs throughout the genome that may have a robust functional impact and are potentially disease causing.”
To overcome this issue, the team developed a deep learning algorithm that could predict where these sites of methylation happened, which could then help researchers determine the effect they might have on nearby genes.
The software, called Deep6mA, applies neural networks to study DNA methylation sites on natural multicellular organisms. This new method holds several advantages, researchers noted. The approach allows for the automation of the sequence feature representation of different levels of detail. Additionally, the method facilitates the integration of a broad spectrum of methylation sequences on nearby genes of interest.
The innovative process could also lead to model development and prediction in large-scale genomic data.
The researchers applied the algorithm to three different types of representative organisms, including A. thaliana, D. melanogaster, and E.coli, the first two being eukaryotic. The deep learning tool was able to identify 6mA methylation sites down to the resolution of a single nucleotide, or basic unit of DNA. Even in this initial confirmation study, researchers were able to visualize regulatory patterns they were unable to see using traditional methods.
“One limitation is that our proposed prediction is purely based on sequence information,” said Zhi Wei, PhD, a professor of computer science at NJIT and a senior co-author of the study.
“Whether a candidate is a 6mA site or not will also depend on many other factors. Methylation, including 6mA, is a dynamic process, which will change with cellular context. In the future, we would like to take other factors into consideration such as gene expression. We hope to predict 6mA across cellular context by integrating other data.”
Despite this limitation, the researchers believe that their study shows the ability for deep learning to accelerate personalized medicine and enhance clinical care.
“We already know that a number of genes have a disease-causing mechanism brought about by methylation, and while this study was not done in human cells, the eukaryotic cell models were very comparable,” Hakonarson said.
“Genomic scientists looking to translate their findings into clinical applications would find this tool very useful, and the level of precision could eventually lead to the discovery of specific cells or targets that are candidates for therapeutic intervention.”