It also got me thinking, just how well can you measure things with those free wooden yardsticks you get at hardware stores and home shows?
Background
The conversation started with a question asking about what kind of quality scoring system could be applied to Helicos data. Could something similar to Phred and AB files be used?
A couple of answers were provided. One referred to the recent Helicos article in Nature Biotechnology and pointed out that Helicos has such a method. This answer also addressed the issue that quality values (QVs) need to be tuned for each kind of instrument.
Another answer, from a core lab director with a Helcos instrument, pointed out many more challenges that exist with comparing data from different applications and how software in this area is lacking. He used the metaphor of the yardstick to make the point that researchers need systematic tools and methods to compare data and platforms.
What's in a Yardstick?
I replied to the thread noting that we've been working with data from 454, Illumina GA, SOLiD and Helicos and there are multiple issues that need to be addressed in developing yardsticks to compare data from different instruments for different experiments (or applications).
At one level, there is the instrument and the data that are produced and the question is can have a standard quality measure? In Phred, we need to recall that each instrument needed to be calibrated so that quality values would be useful and equivalent across chemistries and platforms (primers, terminators, bigdye, gel, cap, AB models, MegaBACE ...). Remember phredpar.dat? Because the data were of a common type - an electropherogram - we could more or less use a single tool and define a standard. Even then, other tools (LifeTrace, KB basecaller, and LongTrace) emerged and computed standardized quality values differently. So, I would argue that we think we have a measure, but it is not the standard we think it is.
By analogy, each NGS instrument uses a very different method to generate sequences, so each platform will have a unique error profile. The good news is that quality values, as transformed error probabilities, make it possible to compare output from different instruments in terms of confidence. The bad news is that if you do not know how the error probability is computed, or you do not have enough data (control, test) to calibrate the system, error probabilities are not useful. Add to that, the fact that the platforms are undergoing rapid change as they improve chemistry, change hardware and software to increase throughput and accuracy. So, for the time being we might have yardsticks, but they have variable lengths.
The next levels deal with experiments. As noted ChiP-Seq, RNA-Seq, Me-Seq, Re-Seq, and your favorite-Seq all measure different things and we are just learning about how errors and other artifacts interfere with how well the data produced actually measure what the experiment intended to measure. Experiment level methods need to be developed so that ChiP-Seq from one platform can be compared to ChiP-Seq from another platform and so on. However, the situation is not dire because in the end, DNA sequences are the final output and for many purposes the data produced are much better now then they have been in the past. As we push sensitivity, the issues already discussed become very relevant.
As a last point, the goal many researchers will have is to layer data from on experiment on another experiment, correlate ChIP-Seq with RNA-Seq for example and to do that you not only need to have quality measures for data, sample, experiment, you also need ways to integrate all of this experimental information with already published data. There is a significant software challenge ahead and, as pointed out, cobbling solutions together is not a long term feasible answer. The datasets are getting to big and complex and at the same time the archives are busting with data generated by others.
So what does this have to do with yardsticks?Back to yardsticks. Those cheap wooden yardstick expand and contract with temperature and humidity, so at different times a yardstick's measurements will change. This change is the uncertainty of the measurement (see additional reading below), which defines the precision of our measuring device. If I want a quick estimate of how tall my dog stands, I would happily use the wooden yardstick. However, if I want to measure something to within a 32nd of an inch or millimeter, I would use a different tool. The same rules apply to DNA sequencing, for many purposes the reads are good enough and data redundancy overcomes errors, but as we push sensitivity and want to measure changes in fewer molecules, discussions about how to compute QVs and annotate data, so that we know which measuring device was used, become very important.
Finally, I often see in the literature, company brochures, and hear in conversation that refer to QVs as Phred scores. Remember: Only Phred makes Phred QVs - everything else is Phred-like, but only if it is a -10log(P) transformation of an error probability.
Additional Reading:
Color Space, Flow Space, Sequence Space, or Outer Space: Part I. Uncertainty in DNA Sequencing
Color Space, Flow Space, Sequence Space or Outer Space: Part II, Uncertainty in Next Gen Data