Saturday, February 9, 2013

Genomics Genealogy Evolves

ResearchBlogging.org
The ways massively parallel DNA sequencing can be used measure biological systems is only limited by imagination. In science, imagination is an abundant resource.

The November 2012 edition of Nature Biotechnology (NBT) focused on advances in DNA sequencing. It included a review by Jay Schendure and Eriz Lieberman Aiden entitled “The Expanding Scope of DNA Sequencing [1],” in which the authors provided a great overview of current and future sequencing-based assay methods with an interesting technical twist. It also made for an opportunity to update a previous Finchtalk.

As DNA sequencing moved from determining the order of nucleotide bases in single genes to the factory style efforts of the first genomes, it was limited to measuring ensembles of molecules derived from single clones or PCR amplicons as composite sequences. Massively parallel sequencing changed the game because each molecule in a sample is sequenced independently. This discontinuous advance resulted in a massive increase in throughput that created a brief, yet significant, deviation in the price performance curve that would be predicted from Moore’s law. It also created a level of resolution that makes it possible to collect data from populations of sequences and see how they vary in a quantitative fashion making it possible to use DNA sequencing as a powerful assay platform. While this was quickly recognized [2], reducing ideas to practice would take a few more years.

Sequencing applications fall into three three main branches: De Novo, Functional Genomics, and Genetics (figure below). The De Novo, or Exploratory branch contains three subbranches: new genomes, meta-genomes, or meta-transcriptomes. Genetics or variation assays form another main branch of the tree. Genomic sequences are compared within and between populations, individuals, or tissue and cells with the goal predicting a phenotype from differences between sequences. Genetic assays can focus on single nucleotide variations, copy number changes or structural differences. Determining inherited epigenetic modifications is another form of genetic assay.

Understanding the relationship between genotype and phenotype, however, requires that we understand phenotype in sufficient detail. In order for this to happen, traditional analog measurements such as height, weight, blood pressure, and disease descriptions need to be replaced with quantitative measurements at the DNA, RNA, protein, metabolism, and other levels. Within each set of “omes” we need to understand molecular interactions and the how the environmental factors such as diet, chemicals, and microorganisms impact these interactions positively or negatively and through modification of the epigenome. Hence, the Functional Genomics branch is fastest growing.

New assays since 2010 are highlighted in color and underlined text.  See [1] for descriptions.
Functional Genomics experiments can be classified into five groups: Regulation, Epi-genomics, Expression, Deep Protein Mutagenesis, and Gene Disruption. Each group can be further divided into specific assay groups (DGE, RNA-Seq, small RNA, etc) that can be even further subdivided into specialized procedures (RNA-Seq with strandedness preserved). When experiments are refined and made reproducible, they become assays with sequence-based readouts.

In the paper, Shendure and Aiden describe 24 different assays. Citing an analogy to language where "Wilhelm von Humboldt described language as a system that makes ‘infi- nite use of finite means’: despite a relatively small number of words and combinatorial rules, it is possible to express an infinite range of ideas," the authors presented assay evolution as a assemblage of a small number of experimental designs. This model is not limited to language. In biochemistry a small number of protein domains and effector molecules are combined, and slightly modified, in different ways to create a diverse array of enzymes, receptors, transcription factors, and signaling cascades.

Subway map from [1]*. 
Shendure and Aiden go on show how the technical domains can be combined to form new kinds of assays using a subway framework, where one enters via a general approach (comparison, perturbation, or variation) and reaches the final sequencing destination. Stations along the way are specific techniques that are organized by experimental motifs including cell extraction, nucleic acid extraction, indirect targeting, exploiting proximity, biochemical transformation, and direct DNA or RNA targeting.

The review focused on the bench and made only brief reference to the informatics issues as part of the "rate limiters" of next-generation sequencing experiments.  It is important to note that each assay will have its own data analysis methodology. That may seem daunting. However, like the assays, the specialized informatics pipelines and other analyses can also be developed from a common set of building blocks. At Geospiza we are very familiar with these building blocks and how they can be assembled to analyze the data from many kinds of assays. As a result, the GeneSifter system is the most comprehensive in terms of its capabilities to support a large matrix of assays, analytical procedures, and species.  If you are considering adding next-generation sequencing to your research or your current informatics is limiting your ability to publish, check out GeneSifter.

1. Shendure, J., and Aiden, E. (2012). The expanding scope of DNA sequencing Nature Biotechnology, 30 (11), 1084-1094 DOI: 10.1038/nbt.2421

2. Kahvejian A, Quackenbush J, and Thompson JF (2008). What would you do if you could sequence everything? Nature biotechnology, 26 (10), 1125-33 PMID: 18846086

* Rights obtained from Rightslink number 3084971224414

No comments: