Oct. 24, 2016
FOR IMMEDIATE RELEASE
CONTACT: Arthur Hirsch
Johns Hopkins scientist leads work on new algorithms for more complete DNA portraits
Scientists’ effort to piece together the genome is taking a significant step forward with a new computerized method that creates more complete and detailed versions of the complex puzzle of life than have ever been produced before.
“We hope and expect this advance will change how new genomes will be sequenced and studied since it gives such an improved view of what is really there,” said Michael Schatz, Bloomberg Distinguished Associate Professor of Computer Science and Biology within Johns Hopkins University’s Whiting School of Engineering and Krieger School of Arts and Sciences. He coordinated a group of 17 scientists from nine institutions in publishing their results in the current issue of Nature Methods.
“Without this approach, you will simply miss a lot of important gene sequences, and many errors can be introduced,” said Schatz, who worked closely with researchers at Pacific Biosciences in Menlo Park, California and Cold Spring Harbor Laboratory in Cold Spring Harbor, N.Y. He was joined by Johns Hopkins colleague, Fritz J. Sedlazeck, a postdoctoral researcher.
Also involved in the research were scientists from the U.S. Department of Energy Joint Genome Institute; the Salk Institute for Biological Studies; University of California, Davis; University of Nevada, Reno; Universita degli Studi di Verona, Verona, Italy.
The development of the two algorithms, FALCON and FALCON-Unzip, that are available free to the public, Schatz said, is analogous to the move from a primitive telescope “that can only see the closest, brightest objects in the sky, to the Hubble space telescope that can dramatically improve the resolution to see things that are much more distant and in much greater focus.”
The improvement from previous methods could have a significant impact in biology and medicine, as “genome assembly is one of the most fundamental and important steps in molecular biology to study the genetics of any living thing,” he said.
Beginning in the 1970s, genome “sequencing” – meaning the process of determining the complete DNA map of an organism at one time – has since produced the life codes for a number of microorganisms, plants and animals, including humans. While many of these have been touted as the “whole genome,” most of them, including the human genome, are not. In most published genomes, big pieces of the picture have been left out.
In producing more detailed genomes of three important species, including the Cabernet Sauvignon red wine grape, the researchers who worked on the new paper show how their approach improves on previous methods. Specifically, most other approaches, Schatz said, “would completely skip the fact that our genome and many genomes are actually ‘diploid’ and have two copies of each chromosome — one from mom and one from dad. Those two copies can be very different from each other, including genes or mutations that you only inherit from your mother or from your father.”
In all three species studied for this paper — Cabernet Sauvignon, a widely studied flowering plant called Arabidopsis thaliana, and a coral fungus – the scientists found large segments of DNA that were specific to one of the two copies.
“An analogy of this might be that previous methods for sequencing genomes would give you a black-and-white representation – ‘haploid,’ just 1 copy of each chromosome,” Schatz said. “But our new software gives you a full-color representation allowing you to see all the details hidden in the shadows.”
The greater detail and accuracy is partly due to the fact that the algorithm produces fewer and bigger puzzle pieces. That is, longer sections of the four chemical compounds known as nucleotides that make up the DNA sequence: adenine, cytosine, guanine and thymine. Larger sections mean fewer gaps and greater precision in understanding what the sequence means.
In the grape genome, for instance, previous methods left the genome shattered into up to 12.8 million pieces averaging only 1,000 nucleotides long. The new approach for the Cabernet Sauvignon grape produces 718 contiguous pieces averaging 2.1 million nucleotides long.
One technical challenge of producing the longer segments was the error rate of 10 to 15 percent in the single molecule sequencing technology used. However, using sophisticated statistical and computational filtering – a corrective lens for DNA sequencing — the system reduces that to on average one error every 10,000 to 100,000 nucleotides, an average error rate of only 0.01% to 0.001%.
Another group of scientists has already used versions of the software to assemble and study the gorilla genome earlier this year, and Schatz said his lab is using the method to study plants, animals, microorganisms that cause disease as well as healthy and diseased human genomes, including studies of cancer.
Johns Hopkins University news releases are available online, as is information for reporters. To arrange a video or audio interview with a Johns Hopkins expert, contact a media representative listed above or visit our studio web page. Find more Johns Hopkins stories on the Hub.