FinchTalk: February 2012

Tuesday, February 14, 2012

Sneak Peek: Poster Presentations at AGBT

The annual Advances in Genome Biology and Technology (AGBT) begins tomorrow and would not be complete without a couple of contributions by @finchtalk.

Follow the tweets at #AGBT and if you are at the conference visit posters 334 and 335 (abstracts below). Also, visit Lanai 189 to see the latest advances in genome technology and software from the Caliper and Geospiza organizations within PerkinElmer.

Poster Abstracts

Poster 335: Why is the $1000 Genome so Expensive?

Rapid advances in sequencing technology are enabling leading institutions to establish programs for genomics-based medicine. Some estimate that 5000 genomes were sequenced during 2011, and an additional 30,000 will be sequenced by the end of 2012. Despite this terrific progress, the infrastructure required to make genomics-based medicine a norm, rather than a specialized application, are lacking. Although DNA sequencing costs are decreasing, sample preparation bottlenecks and data handling costs are increasing. In many instances, the resources (e.g. time, capital investment, experience) required to effectively conduct medical-based sequencing is prohibitive.

We describe a model system that uses a variety of PerkinElmer products to address three problems that continue to impact the widescale adoption of genomics-based medicine: organizing and tracking sample information, sample preparation, and whole genome data analysis. Specifically, PerkinElmer’s GeneSifter® LIMS and analysis software, Caliper instrumentation, and DNA sequencing services can provide independent or integrated solutions for generating and processing data from whole-genome sequencing.

Poster 334: Limitations of the Human Reference Genome Sequence

The human genome reference sequence is well characterized, highly annotated, and its development represents a considerable investment of time and money. This sequence is the foundation for genotyping microarrays and DNA sequencing analysis. Yet, in several critical aspects the reference sequence remains incomplete as are the many research tools that are based on it. We have found that, when new variation data from 1000 Genome Project (1Kg) and Complete Genomics (CG) are used to measure the effectiveness of existing tools and concepts, approximately 50% of probes on commonly used genotyping arrays contain confounding variation, impacting the results of 37% of GWAS studies to date. The sources of confounding variation include unknown variants in close proximity to the probed variant and alleles previously assumed to be di-allelic that are poly-allelic. When mean linkage disequillibrium (LD) lengths from HapMap are compared to 1Kg data, LD decreases from 16.4 Kb to 7.0 Kb within common samples and further decreases to 5.4 Kb when random samples are compared.

While many of the observations have been anecdotally understood, quantitative assessments of resources based on the reference sequence have been lacking. These findings have implications for the study of human variation and medical genetics, and ameliorating these discrepancies will be essential for ushering in the era of personalized medicine.