Friday, May 21, 2010

I Want My FinchTV*

Without a doubt FinchTV is a wildly successful sequence trace viewer. Since it’s launch, close to 150,000 researchers and students have enjoyed its easy to use interface, cross platform capabilities, and unique features. But, where does it go from here? 

Time for Version 2

FinchTV is Geospiza's free DNA sequence trace viewer that is used on Macintosh, Windows, and Linux computers to open and view DNA sequence data from Sanger-based instruments. It reads both AB1 and SCF files and displays the DNA sequence and four color traces of the corresponding electropherogram. When quality values are present, they are displayed with the data. Files can be opened with a simple drag and drop action, and once opened, traces can be viewed in a either a single pane or multi-pane full sequence format. Sequences can be searched using regular expressions, edited, and regions selected and used to launch NCBI BLAST searches.

Over the past years we have learned that FinchTV is used in many kinds of environments suchas research labs, biotechnology and pharmaceutical companies, and educational settings. In some labs, it is the only tool people use to work with their data. We’ve also collected a large number of feature requests that include viewing protein translations, performing simple alignments, working with multiple sequences, changing the colors of the electropherogram tracings, and many others.

Free software is not built from free development 

FinchTV was originally developed under an SBIR grant as a prototype for cross platform software development. Until then, commercial quality trace viewers ran on either Windows or Macintosh, never both. Cross platform viewers were crippled versions of commercial programs, and none of the programs incorporated modern GUI (graphical user interface) features and were cumbersome to use.

FinchTV is a high quality, full featured, free program; we want to improve the current version and keep it free. So, the question becomes how to keep a free product up to date?

One way is through grant funding. Geospiza believes a strong case can be made to develop a new version of FinchTV under an SBIR grant because we know Sanger sequencing is still very active. From the press coverage, one would think next generation DNA sequencing (NGS) is going to be the way all sequencing will soon be done. True there are many projects where Sanger is no longer appropriate, but NGS cannot do small things, like confirm clones. Sanger sequencing also continues to grow in the classroom, hence tools like FinchTV are great as education resources. 

We think there are more uses too, so we’d like to hear your stories.

How do you use FinchTV?
What would you like FinchTV to do?

Send us a note (info at or, even better, add a comment below. We plan to submit the proposal in early August and look forward to hearing your ideas.

* apologies to Dire Straights "Money for Nothing"

Monday, May 17, 2010

Journal Club: GeneSifter Aids Stem Cell Research

Last week’s Nature featured an article entitled “Aberrant silencing of imprinted genes on chromosome 12qF1 in mouse induced pluripotent stem cells [1]” in which GeneSifter Analysis Edition (GSAE) was used to compare gene expression between genetically identical mouse embryonic stem (ES) cells and induced pluripotent stem cells (iPSCs).

Stem Cells 

Stems cells are undifferentiated, pluripotent, cells that later develop into the specialized cells of tissues and organs. Pluripotent cells can divide essentially without limit, become any kind of cell, and have been found to naturally repair certain tissues. They are the focus of research because of their potential for treating diseases that damage tissues. Initially stem cells were isolated from embryonic tissues. However, with human cells, this approach is controversial. In 2006 researchers developed ways to “reprogram” somatic cells to become pluripotent cells [2]. In addition to being less controversial, iPSCs have other advantages, but there are open questions as to their therapeutic safety due to potential artifacts introduced during the reprogramming process. 

Reprogramming cells to become iPSCs involves the overexpression of a select set of transcription factors by viral transfection, DNA transformation, and other methods. To better understand what happens during reprogramming, researchers have examined gene expression and DNA methylation patterns between ES cells and iPSCs and have noted major differences in mRNA and microRNA expression as well as DNA methylation patterns. As noted in the paper, a problem with previous studies is that they compared cells with different genetic backgrounds. That is, the iPSCs harbor viral transgenes that are not present in the ES cells, and the observed differences could likely be due to factors unrelated to reprogramming. Thus, a goal of this paper's research was to compare genetically identical cells to pinpoint the exact mechanisms of reprogramming. 

GeneSifter in Action 

Comparing genetically similar cells requires that both ES cells and iPSCs have the same transgenes. To accomplish this goal, Stadtfeld and coworkers devised a clever strategy whereby they created a novel inducible transgene cassette and introduced it into mouse ES cells. The modified ES cells were then used to generate cloned mice containing the inducible gene cassette in all of their cells. Somatic cells could be converted to iPSCs by adding the appropriate inducing agents to the tissue culture media.
Even though ES cells and iPSCs were genetically identical, ES cells were able to generate live mice whereas iPSCs could not. To understand why, the team looked at gene expression using microarrays. The mRNA profiles for six iPSC and four ES cell replicates were analyzed in GeneSifter. Unsupervised clustering showed that global gene expression was similar for all cells. When the iPSC and ES cell data were compared using correlation analysis, the scatter plot identified two differentially expressed transcripts corresponding to a non-coding RNA (Gtl2) and small nucleolar RNA (Rian). The transcripts’ genes map to the imprinted Dlk1-Dio3 gene cluster on mouse chromosome 12qF1. While these genes were strongly repressed in iPSC clones, the expression of housekeeping and pluripotentency cells was unaffected as demonstrated using GeneSifter’s expression heat maps.

Subsequent experiments that looked at gene expression from over 60 iPSC lines produced from different types of cells and chimeric mice that were produced from mixtures of iPSCs and stem cells showed that the gene silenced iPSCs had limited development potential. Because the Dlk3-Dio cluster imprinting is regulated by methylation, methylation patterns revealed that the Gtl2 allele had acquired an aberrant silent state in the iPSC clones. Finally, by knowing that Dlk3-Dio cluster imprinting is also regulated by histone acetylation, the authors were able to treat their iPSCs with a histone deacetylase inhibitor and produce live animals from the iPSCs. Producing live animals from iPSCs in a significant milestone for the field.

While histone deacetylase inhibitors have multiple effects, and more work will need to be done, the authors have completed a tour de force of work in this exciting field, and we are thrilled that our software could assist in this important study. 

Further Reading

1. Stadtfeld M., Apostolou E., Akutsu H., Fukuda A., Follett P., Natesan S., Kono T., Shioda T., Hochedlinger K., 2010. "Aberrant silencing of imprinted genes on chromosome 12qF1 in mouse induced pluripotent stem cells." Nature 465, 175-181.

2. Takahashi K., Yamanaka S., 2006. "Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors." Cell 126, 663-676.

Tuesday, May 11, 2010

Journal Club: Decoding Biology

DNA sequences hold the information needed to create proteins and regulate their abundance. Genomics research focuses on deciphering the codes that control these processes by combining DNA sequences with data form assays that measure gene expression and protein interactions. The codes are deciphered when specific sequence elements (motifs) are identified and can be later used to predict outcomes. The recent Nature article “Deciphering the Splicing Code,” begins to reveal the codes of alternative splicing.

The genetic codes 

Since the discovery that DNA is a duplex molecule [1] which stores and replicates the information of living systems, the goal of modern biology has been to understand how the blueprint of a living system is encoded in its DNA. The first quest was to learn how DNA's four letter nucleotide code was translated into the 20 letter amino acid code of proteins. Experiments conducted in the 1960’s revealed that different combinations of triplet DNA bases encoded specific amino acids to produce the “universal” genetic code, which is nearly identical in all species that have been examined to date [2].

Translating mRNA into protein is a complex process, however, that involves many proteins and ribosomal RNA (rRNA) collectively organized in ribosomes. As the ribosomes read the mRNA sequence, transfer RNA (tRNA) molecules bring individual amino acids to the ribosome where they are added to a growing polypeptide chain. The universal genetic code explained how tri-nucleotide sequences specified amino acids. It could also be used to elucidate the anti-codon portion of tRNA [3], but it could not explain how the correct amino acid was added to the tRNA. For that another genetic code needed to be cracked. In this code, first proposed in 1988 [4], multiple sequences, including the anti-codon loop, within each tRNA molecule are recognized by a matched enzyme that combines an amino acid with its appropriate tRNA.

Codes to create diversity

The above codes are involved with the process of translating genetic sequences into protein. Most eukaryotic genes, and a few prokaryotic genes, cannot be translated in a continuous way because the protein coding regions (exons) are interrupted by non-coding regions (introns). When DNA is first transcribed into RNA, all regions are included and the introns must be excised to form the final messenger RNA (mRNA). This process makes it possible to create many different proteins from a single gene through alternative splicing in which exons are either differentially removed or portions of exons are joined together. Alternative splicing occurs in development and tissue specific ways; many disease causing mutations disrupt splicing patterns. So, understanding the codes that control splicing is an important research topic.

Some of the splicing codes, such as the exon boundaries, are well known, and others are not. In “Deciphering the Splicing Code,” Barash and colleagues looked at thousands of alternatively spliced exons - and surrounding intron sequences - from 27 mouse tissues to unravel over 1000 sequence features that could define a new genetic code. Their goal is build catalogs of motifs that could be used to predict splicing patterns of uncharacterized exons and determine how mutations might affect splicing.

Using data from existing microarray experiments, RNA sequence features compiled from the literature, and other known attributes of RNA structure, Barash and co-workers developed computer models to determine which combinations of features best correlated with experimental observations. The resulting computer program provided tissue specific splicing predictions of whether an exon would be included or excluded based on its surrounding motif sequences and tissue type with reasonable success. More importantly, the program could be used to identify interaction networks that identified pairs of motifs that were frequently observed together. 

Predicting alternative splicing is at an early stage, but as pointed out be the editorial summary, the approach of Barash and co-workers will be improved by the massive amounts of data being generated by new sequencing technologies and applications like RNA-Seq and various protein binding assays. The real test will be expanding the models to new tissues and human genomics. In the meantime, if you want to test their models on some of your data or explore new regulatory elements, the Frey lab has developed a web tool that can be accessed at

I’m done with seconds, can I have a third? 

As an aside, the authors of the editorial summary coined the work as the second genetic code. I find this amusing, because this would be the third second genetic code. The aminoacyl tRNA code was also coined the second genetic code, but people must have forgotten that, because another second genetic code was proposed in 2001. This genetic code describes how methylated DNA sequences regulate chromatin structure and gene regulation. Rather than have a third second genetic code, maybe we should refer to this as the third genetic code or the next generation code.

Further Reading

1. Watson JD, and Crick F (1953). "A structure for deoxyribose nucleic acid". Nature 171: 737–8.



4. Hou YM, Schimmel P (1988) "A simple structural feature is a major determinant of the identity of a transfer RNA." Nature 333:140-5.