Sunday, May 12, 2013

Sneak Peek: Elucidating the Effects of the Deep Water Horizon Oil Spill on the Atlantic Oyster Using RNA-Sequencing Data Analysis Methods

Join us this Tuesday, May 21st at 10 AM Pacific Time / 1:00 PM Eastern Time, for an interesting webinar on the effects of the Deep Water Horizon oil spill.

Natalia G. Reyero, PhD. – Mississippi State University
N. Eric Olson, PhD. – PerkinElmer Sr Leader Product Development

The Deep Water Horizon oil spill exposed the commercially important Atlantic oyster to over 200 million gallons of spill-related contaminants. To study toxicity effects, we sequenced the RNA of oyster samples from before and after the spill. In this webinar, we will compare and contrast the different data analysis methodologies used to address the challenge of an organism lacking a well-annotated genome assembly. Furthermore, we will discuss how the newly generated information provided insight into underlying biological effects of oil and dispersants on Atlantic oysters during the Deep Water Horizon oil spill.

REGISTER HERE to attend.

Thursday, February 28, 2013

ABRF 2013

The annual Association of Biomedical Research Facilities begins this weekend (March 2 - 5).  We [PerkinElmer] will be busy at the conference as participants and as a vendor supporting this great organization and our many customers. From client presentations to our own work we will share our latest and greatest.


Saturday: 3/2 "Breaking the Data Analysis Bottleneck: Solutions That Work for RNA and Exome Sequencing." Rebecca Laborde, Mayo Clinic will present on how the teams she works with use GeneSifter for they're NGS data analysis. This is part of Satellite Workshop 1: Applications of NGS.

Monday: 3/4 "Oyster Transcriptome Analysis by Next Gen Sequencing." Natalia Reyero, Genetics and Development Biology Center, NHLBI will give a presentation based on her award nominated poster (below) during the (RG5) Genomics Research Group. GeneSifter had a role in the data analysis.

Saturday and Monday Posters:

#7 "Identifying Mutations in Transcriptionally Active Regions on Genomes Using Next Generation Sequencing." Eric Olson, PerkinElmer, presents ways in which RNA-seq can be used to define transcripts to identify functional mutations in organisms that have sparsely annotated reference genomes. 

#11 "What Does It Take to Identify the Signal from the Noise in Molecular Profiling of Tumors?" Eric Olson, PerkinElmer presents ways to use RNA sequencing and bioinformatic approaches to filter the vast numbers of variants observed in DNA sequence data obtained from tumors to a manageable number that are most likely to be the drivers of tumor growth.

Award Nominee
#119 "Elucidating the Effects of the Deepwater Horizon Oil Spill on the Atlantic Oyster Using Global Transcriptome Analysis" Natalia Reyero, Genetics and Development Biology Center, NHLBI. If you are interested in learning about the aftermath of the Gulf of Mexico oil spill, you will find Natalia's work interesting. 

And that's not all. The the booth will be hopping. We will have meet the speaker opportunities on Sunday and Tuesday and as well as many demos of our GeneSifter Analysis and LIMS products.  PerkinElmer Informatics will have individuals to show new features in PerkinElmer's Electronic Laboratory Notebook and other products, and Caliper and Chemagen reps will be on hand to talk about the great things we do for sample prep.  

Check us out a Booth 522 to get schedules and see what's new.

Saturday, February 9, 2013

Genomics Genealogy Evolves
The ways massively parallel DNA sequencing can be used measure biological systems is only limited by imagination. In science, imagination is an abundant resource.

The November 2012 edition of Nature Biotechnology (NBT) focused on advances in DNA sequencing. It included a review by Jay Schendure and Eriz Lieberman Aiden entitled “The Expanding Scope of DNA Sequencing [1],” in which the authors provided a great overview of current and future sequencing-based assay methods with an interesting technical twist. It also made for an opportunity to update a previous Finchtalk.

As DNA sequencing moved from determining the order of nucleotide bases in single genes to the factory style efforts of the first genomes, it was limited to measuring ensembles of molecules derived from single clones or PCR amplicons as composite sequences. Massively parallel sequencing changed the game because each molecule in a sample is sequenced independently. This discontinuous advance resulted in a massive increase in throughput that created a brief, yet significant, deviation in the price performance curve that would be predicted from Moore’s law. It also created a level of resolution that makes it possible to collect data from populations of sequences and see how they vary in a quantitative fashion making it possible to use DNA sequencing as a powerful assay platform. While this was quickly recognized [2], reducing ideas to practice would take a few more years.

Sequencing applications fall into three three main branches: De Novo, Functional Genomics, and Genetics (figure below). The De Novo, or Exploratory branch contains three subbranches: new genomes, meta-genomes, or meta-transcriptomes. Genetics or variation assays form another main branch of the tree. Genomic sequences are compared within and between populations, individuals, or tissue and cells with the goal predicting a phenotype from differences between sequences. Genetic assays can focus on single nucleotide variations, copy number changes or structural differences. Determining inherited epigenetic modifications is another form of genetic assay.

Understanding the relationship between genotype and phenotype, however, requires that we understand phenotype in sufficient detail. In order for this to happen, traditional analog measurements such as height, weight, blood pressure, and disease descriptions need to be replaced with quantitative measurements at the DNA, RNA, protein, metabolism, and other levels. Within each set of “omes” we need to understand molecular interactions and the how the environmental factors such as diet, chemicals, and microorganisms impact these interactions positively or negatively and through modification of the epigenome. Hence, the Functional Genomics branch is fastest growing.

New assays since 2010 are highlighted in color and underlined text.  See [1] for descriptions.
Functional Genomics experiments can be classified into five groups: Regulation, Epi-genomics, Expression, Deep Protein Mutagenesis, and Gene Disruption. Each group can be further divided into specific assay groups (DGE, RNA-Seq, small RNA, etc) that can be even further subdivided into specialized procedures (RNA-Seq with strandedness preserved). When experiments are refined and made reproducible, they become assays with sequence-based readouts.

In the paper, Shendure and Aiden describe 24 different assays. Citing an analogy to language where "Wilhelm von Humboldt described language as a system that makes ‘infi- nite use of finite means’: despite a relatively small number of words and combinatorial rules, it is possible to express an infinite range of ideas," the authors presented assay evolution as a assemblage of a small number of experimental designs. This model is not limited to language. In biochemistry a small number of protein domains and effector molecules are combined, and slightly modified, in different ways to create a diverse array of enzymes, receptors, transcription factors, and signaling cascades.

Subway map from [1]*. 
Shendure and Aiden go on show how the technical domains can be combined to form new kinds of assays using a subway framework, where one enters via a general approach (comparison, perturbation, or variation) and reaches the final sequencing destination. Stations along the way are specific techniques that are organized by experimental motifs including cell extraction, nucleic acid extraction, indirect targeting, exploiting proximity, biochemical transformation, and direct DNA or RNA targeting.

The review focused on the bench and made only brief reference to the informatics issues as part of the "rate limiters" of next-generation sequencing experiments.  It is important to note that each assay will have its own data analysis methodology. That may seem daunting. However, like the assays, the specialized informatics pipelines and other analyses can also be developed from a common set of building blocks. At Geospiza we are very familiar with these building blocks and how they can be assembled to analyze the data from many kinds of assays. As a result, the GeneSifter system is the most comprehensive in terms of its capabilities to support a large matrix of assays, analytical procedures, and species.  If you are considering adding next-generation sequencing to your research or your current informatics is limiting your ability to publish, check out GeneSifter.

1. Shendure, J., and Aiden, E. (2012). The expanding scope of DNA sequencing Nature Biotechnology, 30 (11), 1084-1094 DOI: 10.1038/nbt.2421

2. Kahvejian A, Quackenbush J, and Thompson JF (2008). What would you do if you could sequence everything? Nature biotechnology, 26 (10), 1125-33 PMID: 18846086

* Rights obtained from Rightslink number 3084971224414

Sunday, January 27, 2013

Sneak Peek: Identifying Mutations in Expressed Regions of Genomes Using NGS

Join us Wednesday, January 30th at 1 PM (EST), 10 AM (PST) to learn how to use NGS to identify mutations in expressed regions of genomes.


The pace at which genome references are being generated for plants and animal species is rapidly increasing with Next Generation Sequencing technologies. While this is a major step forward for researchers studying species that previously did not have sequenced genomes, it is only the beginning of the process toward defining the biology underlying the genome. As long as a reference is available, DNA variants can be readily identified on a genome wide scale, often producing lists of 100s of thousands or even millions of variants. Frequently these variants that occur in expressed genes are of the most interest; however, if annotation defining where genes exist within a genome is not available or poorly defined, identifying which mutations might affect protein coding may not be possible. To address this challenge we will describe a method whereby RNA-Seq can be readily used to identify transcriptionally active regions which creates transcript annotation for un-annotated or enhanced annotation for any organism. This annotation can then be used in conjunction with whole genome sequencing to annotate variants as to whether they fall within transcriptionally active regions thus facilitating the identification of mutations in larger repertoire of expressed regions of a genome.

Eric Olson, Ph.D., and  Hugh Arnold, Ph.D., from Geospiza will present. 

Thursday, January 17, 2013

Bio Databases 2013  I seem to have committed to an annual ritual of summarizing the Nucleic Acids Research (NAR) Database Issue [1]. I do this because it is important to understand and emphasize the increasing role of data analysis in modern biology and remind us about the challenges that persist in turning data into knowledge.

Sometimes I hear individuals say they are building a database of all knowledge. To them I say good luck! The reality is new knowledge is developed from unique insights that are derived from specialized aggregations of information. Hence, as more data become available, through decreasing data collection costs, the number of resources and tools that are used to organize, analyze, and annotate data and information increases. Interestingly data cost decreases result from increased production due to technical improvements, which is an exponential function, whereas database growth is linear. Collecting data is the easy part.

How many are there?

Databases live in the wild and thus are hard to count.  Reading the introduction to the database issue one would think 88 new databases were added (cited), but if you compare the number being tracked by NAR in 2012 (1380) to 2013 (1512), you get 132. Moreover, databases tracked by NAR are contributed by authors.  Some don't bother with this.  For example, SeattleSNPs, home of the SeattleSeq Annotation and important Genome Variant Servers*, is not listed in NAR.  Nevertheless the NAR registry continues to increase by about 100 databases per year.

What's new?

Last year, I noted that the new databases did not reflect any discrenable pattern in terms of how the field of biology was changing. Rather the new databases reflect increasing specialization and complexity.  That trend continues, but this year Fernández-Suárez and Galperin note the emergence of new databases for studying human disease. Altogether eight databases were cited in the introduction. Several others are listed in a table highlighting the new new databases. While databases specializing in human genetics are not new, the past year saw an increased emphasis on understanding the relationship between genotype and phenotype as we advance our understanding of rare variation and population genetics.

As noted, many databases support human genomics research. If you visit the NAR Database Summary Category List and expand the list of databases under the Human Genes and Diseases list, you find four sub categories (General human Genetics, general polymorphism, Cancer gene, and Gene-, system-, or disease-specific databases) listing approximately 174 database. I say approximately because, as noted above, databases are hard to count. Curiously, just above Human Genes and Diseases is a category called Human and Vertebrate Genomes. Database are hard to classify too.

What's useful?

It is clear that the growing number of databases reflects an increasing level of specialization. Also likely is a high degree of redundancy. 10 microRNA databases (found by virtue of starting with "miR") cover general and specific topics including miRNAs that are predicted from sequence or literature, verified by experiment as existing or having a target, being possibly pathogenic, or existing in different organisms. It would be interesting to see which of these databases have the same data, but that is hard as some sites make all data available and some make their data searchable only.  In the former case, getting the data requires that it be put into a common format to make comparisons. Hence, access and interoperability issues persist.

Databases also persist. Fernández-Suárez and Galperin commented on efforts to curate the NAR collection. The annual attrition rare is less than 5% and greater than 90% of the databases are functional as determined by their response to webbots. Some have merged into other projects. What is not known is the quality of the information. In other words how are databases verified for accuracy or maintained to reflect our changing state of knowledge? As databases become increasing used in medical sequencing caveat emptor changes to caveat venditor and validation will be a critical component of design and maintenance. Perhaps future issues of the NAR database update will comment on these challenges.

[1] Fernández-Suárez XM, and Galperin MY (2013). The 2013 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic acids research, 41 (D1) PMID: 23203983

* The SeattleSeq and Genome Variant Server links will break at the next update because the URL's contain the respective database version numbers.