Genetic analysis workflows involve both complex laboratory and data analysis and manipulation procedures. A good workflow management system not only tracks processes, but simplifies the work.
In my last post , I introduced the concept of workflows in describing the issues one needs to think about as they prepare their lab for Next Gen sequencing. To better understand these challenges, we can learn from previous experience with Sanger sequencing in particular and genetic assays in general.
As we know, DNA sequencing serves many purposes. New genomes and genes in the environment are characterized and identified by De Novo sequencing. Gene expression can be assessed by measuring Expressed Sequence Tags (ESTs), and DNA variation and structure can be investigated by resequencing regions of known genomes. We also know that gene expression and genetic variation can also be studied with multiple technologies such as hybridization, fragment analysis, and direct genotyping and it is desirable to use multiple methods to confirm results. Within each of these general applications and technology platforms, specific laboratory and bioinformatics workflows are used to prepare samples, determine data quality, study biology, and predict biological outcomes.
The process begins in the laboratory.
Recently I came across a Wikipedia article on DNA sequencing that had a simple diagram showing the flow of materials from samples to data. I liked this diagram, so I reproduced it, with modifications. We begin with the sample. A sample is a general term that describes a biological material. Sometimes, like when you are at the doctor, these are called specimens. Since biology is all around and in us, samples come from anything that we can extract DNA or RNA from. Blood, organ tissue, hair, leaves, bananas, oysters, cultured cells, feces, you-can-image-what-else, can all be samples for genetic analysis. I know a guy who uses a 22 to collect the apical meristems from trees to study poplar genetics. Samples come from anywhere.
With our samples in hand, we can perform genetic analyses. What we do next depends on what we want to learn. If we want to sequence a genome we're going to prepare a DNA library by randomly shearing the genomic DNA and cloning the fragments into sequencing vectors. The purified cloned DNA templates are sequenced and the data we obtain are assembled into larger sequences (contigs) until, hopefully, we have a complete genome. In resequencing and other genetic assays, DNA templates are prepared from sample DNA by amplifying specific regions of a genome with PCR. The PCR products, amplicons, are sequenced and the resulting data are compared to a reference sequence to identify differences. Gene expression (EST and hybridization) analysis follows similar patterns except that RNA is purified from samples and then converted to cDNA using RT-PCR (Reverse Transcriptase PCR, not Real Time PCR - that's a genetic assay).
From a workflow point of view, we can see how the physical materials change throughout the process. Sample material is converted to DNA or RNA (nucleic acids), and the nucleic acids are further manipulated to create templates that are used for the analytical reaction (DNA sequencing, fragment analysis, RealTime-PCR, ...). As the materials flow through the lab, they're manipulated in a variety of containers. A process may begin with a sample in a tube, use a petri plate to isolate bacterial colonies, 96-well plates to purify DNA and perform reactions, and 384-well plates to collect sequence data. The movement of the materials must be tracked, along with their hierarchical relationships. A sample may have many templates that are analyzed, or a template may have multiple analyses. When we do this a lot we need a way to see where our samples are in their particular processes. We need a workflow management system, like FinchLab.