Yesterday (July 11, 2012), PLoS ONE published an article prepared by my colleagues and myself entitled "
Limitations of the Reference Genome for Personalized Genomics."
This work, supported by Geospiza's
SBIR targeting ways to improve mutation detection and annotation, explored some the resources and assumptions that are used to measure and understand sequence variation. As we know, a key deliverable of the human genome project was to produce a high quality reference sequence that could be used to annotate genes, develop research tools like genotyping and microarray assays, and provide insights to guide software development. Projects like
HapMap used these resources to provide additional understandings in terms of genetic linkage in populations.
|
Decreasing sequencing costs |
Since those early projects, DNA sequencing costs have plummeted. As a result, endeavors such as the
1000 Genomes Project (1KGP) and public contributions from
Complete Genomics (CG) have dramatically increased the number of known sequence variants. A question worth asking is how do these new data contribute to an understanding of the utility of current resources and assumptions that have guided genomics and genetics for the past six or seven years?
|
Number of variants by dbSNP build |
To address the above question, we evaluated several assay and software tools that were based on the human genome reference sequence in the context of new data contributed by 1KGP and CG. We found a high frequency of confounding issues with microarrays, and many cases where invalid assumptions, encoded in bioinformatics programs, underestimate variability or possibly misidentify the functional effects of mutations. For example, 34% of published array-based GWAS studies for a variety of diseases utilize probes that contain undocumented variation or map to regions of previously unknown structural variation. Similarly, assumptions about the size of linkage disequillibrium decrease as the numbers of variants increase.
The significance of this work is that it documents what many are anecdotally experiencing. As we continue to learn about the contributing role of rare variation in human disease we need to fully understand how current resources can be used and work to resolve discrepancies in order to create an era of personalized medicine.
(2012). Limitations of the Human Reference Genome for Personalized Genomics, PLoS ONE, DOI: 10.1371/journal.pone.0040294.t002