In recent posts, we stressed the point that unlike Sanger sequencing, Next Gen sequencing demands that data collection and analysis be tightly coupled, and presented our initial approach of analyzing Next Gen data with the Maq program. We also discussed how the different steps (basecalling, alignment, statistical analysis) provide a framework for analyzing Next Gen data and described how these steps belong to three phases: primary, secondary, and tertiary data analysis. Last, we gave an example of how FinchLab can be used to characterize data sets for Tag Profiling experiments. This post expands the discussion to include characterization of data sets for ChIP-Seq.
ChiP (Chromosome Immunoprecipitation) is a technique where DNA binding proteins, like transcription factors, can be localized to regions of a DNA molecule. We can use this method to identify which DNA sequences control expression and regulation for diverse genes. In the ChIP procedure, cells are treated with a reversible cross-linking agent to "fix" proteins to other proteins that are nearby, as well as the chromosomal DNA where they're bound. The DNA is then purified and broken into smaller chunks by digestion or shearing and antibodies are used to precipitate any protein-DNA complexes that contain their target antigen. After the immunoprecipitation step, unbound DNA fragments are washed away, the bound DNA fragments are released, and their sequences are analyzed to determine the DNA sequences that the proteins were bound to. Only few years ago, this procedure was much more complicated than it is today, for example, the fragments had to be cloned before they could be sequenced. When microarrays became available, a microarray-based technique called ChIP-on-chip made this assay more efficient by allowing a large number of precipitated DNA fragments to be tested in fewer steps.
Now, Next Gen sequencing takes ChIP assays to a new level . In ChIP-seq the same cross linking, isolation, immunoprecipitation, and DNA purification steps are carried out. However, instead of hybridizing the resulting DNA fragments to a DNA array, the last step involves adding adaptors and sequencing the individual DNA fragments in parallel. When compared to microarrays, ChiP-seq experiments are less expensive, require fewer hands-on steps and benefit from the lack of hybridization artifacts that plague microarrays. Further, because ChIP-seq experiments produce sequence data, they allow researchers to interrogate the entire chromosome. The experimental results are no longer to the probes on the micoarray. ChIP-Seq data are better at distinguishing similar sites and collecting information about point mutations that may give insights into gene expression. No wonder ChIP-Seq is growing in popularity.
To perform a ChIP-seq experiment, you need to have a Next Gen sequencing instrument. You will also need to have the ability to run an alignment program and work with the resulting data to get your results. This is easier said than done. Once the alignment program runs, you might have to also run additional programs and scripts to translate raw output files to meaningful information. The FinchLab ChIP-seq pipeline, for example, runs Maq to generate the initial output, then runs Maq pileup to convert the data to a pileup file. The pileup file is then read by a script to create the HTML report, thumbnail images to see what is happening and "wig" files that can be viewed in the UCSC Genome Browser. If you do this yourself, you have to learn the nuances of the alignment program, how to run it different ways to create the data sets, and write the scripts to create the HTML reports, graphs, and wig files.
With FinchLab, you can skip those steps. You get the same results by clicking a few links to sort the data, and a few more to select the files, run the pipeline, and view the summarized results. You can also click a single link to send the data to the UCSC genome browser for further exploration.
ChIP-seq: welcome to the new frontier Nature Methods - 4, 613 - 614 (2007)