Monday, February 11, 2008

Using the Finch Q >20 plots to evaluate your data


All of the Finch systems: Solutions Finch, FinchLab, and iFinch; have a folder report with visual snapshots that summarize the quality of data in that folder. The Q20 histogram plot is one of those tools and in these next two posts, I'll describe what we can learn from these plots.


First, we'll talk about the values on the x axis. When we use the term "Q> 20 bases," we're referring to the number of bases in a read that have a quality value greater than 20. If a base has a quality value of 20, there is a 1 in 100 chance that the base has been misidentified. We use the Q20 value to mark a threshold point where a base has an acceptable quality value.

Histogram plots work by consolidating data that fit into a certain range. In the graph above, you can see that on the x axis, we show groups of reads. The first group contains reads that have less than 50 good (Q > 20) bases. The next group contains reads that have between 50 and 99 good bases, next 100 to 149, and so on.

On the y axis, we show the number of reads that fall into each group. In this graph, we have almost 30 reads that have over 950 good quality bases.

Uhmm, uhmm, uhhmmm, good sequence data, just the stuff I like to see.

No comments: