How big data can answer fundamental questions about human health

Post Views: 54

Source: theweek.com

Fly into Britain’s Manchester Airport these days and you might spot a new landmark amid the urban sprawl on the ground below. Two huge white cylinders stand sentinel: the only outward sign of a massive biomedical project that promises a revolution in science and health care. And like all revolutions, this one is born in blood.

The cylinders pump liquid nitrogen into a facility called the U.K. Biobank. Inside the walls of this anonymous-looking industrial unit, scientists hold the bodily fluids of half a million Britons in state-of-the-art, robot-managed freezers. Research does not come more open-access than this. Blood biochemistry, genetic analysis, images of brains, hearts, and other organs — all the internal secrets of volunteers — are combined with intimate personal confessions about lifestyle, such as how many sexual partners someone’s had, how much alcohol they drink, and if they routinely drive faster than the motorway speed limit.

The results of that largesse are flowing. In a given month, dozens of scientific studies can appear based on U.K. Biobank data. They range from the curious — how many cups of coffee can safely be consumed in a single day — to the fundamental, such as the discovery that specific gene variants are associated with disease or healthy life expectancy. And in an area of research where size is crucial, such studies count their volunteers not by the hundred or the thousand, but by the hundred thousand. More than a century after Ernest Rutherford’s Manchester lab showed the world how to unlock the secrets inside the atom, the city is showcasing how Big Data can answer fundamental questions about human health.

“The U.K. Biobank is the gold standard right now,” says Josh Denny, a researcher in biomedical informatics at Vanderbilt University Medical Center in Nashville, Tennessee. “Worldwide it’s the benchmark of an open-access large database with rich information and genetics.” Denny published an article on this subject — using clinical data to get the most out of genomic research — for the Annual Review of Biomedical Data Science in 2018. “What we do when we bring health care and genetics data together is to get at the outcomes that are important to us,” he says.

Even as results emerge touching on everything from aging to susceptibility to asthma, the biobank effort isn’t without its detractors or bumps in the road. Some worry that the broad nature of the research done with the samples makes it impossible for volunteers to give proper consent. And in October, a high-profile paper was withdrawn because of technical problems in the way biobank data were analyzed.

But to scientists like Denny, the promise is clear. “This is a resource for the world,” he says.

The principle behind the U.K. Biobank is ambitious: to link health outcomes to the genetic data that pour from DNA sequencing machines across the world. Medicine traditionally is guided by a patient’s physical symptoms and measurable changes to physiology — what biologists call the phenotype. Integrating genetic data — a patient’s genotype — into these deliberations could help tailor treatments to boost their effectiveness, or even identify people at higher risk of developing a given disease, who could be offered help earlier. But to make that work, scientists need to connect the dots: match genotype to phenotype, find patterns and connections in the way people’s DNA varies and the way their health does, too.

Those connections are becoming clearer. In February this year, for example, scientists found genetic markers in the biobank data that linked high cholesterol to the development of motor neuron disease. Cholesterol-lowering drugs like statins, the results suggest, might prevent this deadly and incurable condition. Last month, a different team combed through the genetics of 334,000 of the half-million people signed up to the biobank project to identify genes associated with problematic metabolism of uric acid, which causes health problems including the painful condition gout. From head to toe, month by month scientists are using the biobank information to reveal everything from the benefits of being left-handed to the damage that diabetes can do to the heart.

The U.K. project isn’t the first to recruit volunteers to identify links between genes and disease. National efforts are also underway in Estonia, Sweden, Iceland, China, and Mexico. And back in the 1990s, the Icelandic company deCODE set out to build a database of the genes they found in the country’s population. Analysis of the Icelandic data, now owned by the U.S. biopharmaceutical giant Amgen, continues — for example, the company is now working on medicines to mimic the heart-protecting effects of a gene variant carried by one in 120 Icelanders.

Even with its high volunteer numbers, the U.K. Biobank isn’t the largest project of its type either. The British effort can call on data from some 500,000 recruits — but, as it often does, the U.S. military has gone further. In April, the U.S. Million Veterans Program signed up its 750,000th participant since it began in 2011, and still wants more to reach its eponymous goal. The MVP screens the health and genomes of veterans to probe the genetics of post-traumatic stress disorder, diabetes, heart disease, suicide prevention, and other topics of particular relevance to that community.

So, what’s so great about Great Britain’s project? Access. Other biobanks set up around the world are useful projects that can help answer some specific questions, Denny says. But it’s often difficult for outside scientists to get access to the data. Some national projects guard their secrets from foreign eyes as a way to give their own researchers a head start. Others fret about privacy and losing the trust of participants if they were to start sharing their information more widely.

The U.K. Biobank is unique because open and free data access for everyone was the plan from day one, says Rory Collins, an epidemiologist at the University of Oxford and chief executive of the U.K. Biobank project. “We wanted to build something, a resource, in the same way as they built CERN,” the European particle physics lab near Geneva, he says. “This wasn’t a grant application which has to have a specific hypothesis.” It’s a point that other people attached to the Biobank project make repeatedly: This is a basic science project. If they built it, they thought that scientists would come and want to use it.

They have come, and continue to do so. At last count, 13,000 scientists in 77 countries, from Australia and Malaysia to Russia and Jordan, have been given access to data on topics from cognition and sleep to mental health.

Prompted by a call from British scientists to invest in the promise of DNA, the biobank started life as a funding pledge from Tony Blair’s new Labour government in 1998. Backed by the Medical Research Council, a state funder, and the Wellcome Trust, a biomedical charity, the project was based on the principles of the famous Framingham Heart Study, an influential population cohort study that followed 5,200 residents of Framingham, Massachusetts, as a way to find factors that influence cardiac illness.

The U.K. started to recruit volunteers to its study in 2006 and reached its half-million goal four years later. It focused on individuals ages 40 to 69 because organizers figured it would be most useful to study older people, who tend to more quickly show the signs of ill health that researchers are interested in. (Indeed, the carefully preserved samples at the Manchester HQ now represent the earthly remains of at least 20,000 volunteers who have since passed away.)

Participants weren’t paid and had to spend hours at one of several regional centers, where they surrendered blood and urine, had their health examined and filled in surveys on their habits and lifestyle. As a result, the biobank population is not as diverse as geneticists might like, Collins admits, especially if the results are supposed to be useful around the world. Some 94 percent of people the biobank signed up are white, and certain socioeconomic groups, including young, low-income white men, are underrepresented.

Initially, the blood samples were analyzed for simple variations in genetic sequence, such as single nucleotide polymorphisms. These single base-pair changes in DNA occur at specific places in the genome and can explain traits such as eye color and inherited diseases such as cystic fibrosis and sickle-cell anemia. They also act as markers to indicate risk of complex diseases, including diabetes and Alzheimer’s.

The first of these genotyping data were released for 150,000 biobank participants in May 2015. Results from the other 350,000 were added two years later. That fulfilled the original plan, but as genetic sequencing has become faster and cheaper, other researchers wanted to go further. In 2017, the drug firms GSK and Regeneron offered to sequence the “exome” of 50,000 U.K. Biobank participants. This gives a readout of sections of DNA that actually code for proteins, and is seen as a more powerful way to locate information that could be used to develop medicines.

The companies agreed to pay the bill, but wanted something in return: exclusive access to the data. They were given 6 to 12 months, then in 2019 the information was released to the wider scientific community. A larger group of pharma companies is working on exome sequences for the remaining 450,000 volunteers under the same arrangement.

How big data can answer fundamental questions about human health

Related Posts

What is Data Ethics and what are the Types of Data Ethics Tools?

What is High-Performance Computing Clusters and what are the Components of HPC Clusters