Wednesday, December 31, 2008

Closing 2008

As we bring 2008 to a close, it is a good time to reflect on our progress and think about the new year ahead. Despite the world economic news, both the genomics field in general and Geospiza specifically have many positive accomplishments to show for the year.

In February, we introduced FinchLab for Next Gen Sequencing at the AGBT and ABRF conferences. At these shows, it was clear that Next Gen Sequencing was going to change the ways we think about applying DNA sequencing to interrogate a multitude of genetic and functional genomics problems. Over the course of 2008, many papers were published demonstrating the value of the massively parallel sequencing technology. MassGenomics dubbed 2008: Year of the Cancer Genome. Other blogs are following suit with articles on personal genomics and other advancements provided largely through Next Gen Sequencing.

Throughout the year, we also learned that while you can do a lot with a huge amount of data, working with the data is extremely challenging. Conference presentations and editorials in journals frequently made this point. While many of these articles focused on the data management challenge, groups acquiring the technology were also learning that the challenges go beyond data management. Comprehensive software systems are needed to manage all facets of the process, from tracking how samples are prepared for specific experiments to how the data are stored and organized, to analyzing and presenting the data according to the experiment being performed. In short, we learned that Next Gen technologies produce sequence data in different ways and require that we think about DNA sequencing in new ways.

Geospiza’s Version 3 Software Platform and GeneSifter

To address these new challenges, and expand support for existing technologies, Geospiza accomplished two significant milestones in 2008. First, we released the third version of our software platform that supports both laboratory workflows and data analysis automation. Through this system, laboratories are able to set up different interfaces to collect experimental information, assign specific workflows to experiments, track the workflow steps in the laboratory, prepare samples for data collection runs, link data back to the original samples and process data according to the needs of the experiment - without any programming. More importantly, for those who want to develop data analysis pipelines, the system provides a deployable environment that lets you add new pipelines and make them easily accessible.

The second major milestone was our acquisition of GeneSifter. GeneSifter is an award-winning microarray data analysis product. With GeneSifter , Geospiza can deliver complete end to end systems for data intensive genetic analysis applications like microarrays and Next Gen sequencing based transcription. Also, GeneSifter, like Geospiza’s other products is web-based and can be delivered as a Software as a Service (SaaS) product.

SaaS was one of the important themes for 2008. Geospiza understands well that data intensive science requires a significant IT (Information Technology) investment. Throughout 2008, we saw first-hand that groups building their own IT infrastructures were not only challenged by investing heavily in quickly depreciating hardware assets, they experienced basic infrastructure challenges like having enough space, power, and cooling systems for the equipment. If those problems were solved, there were the other challenges with getting systems set up, running, installing software, and having experienced people - and time - to maintain the infrastructure. SaaS solves those problems and off loads the burden of maintaining expensive infrastructures. For a number of groups, locally run systems are the right choice. However, it is a choice that should be carefully thought out and well-planned. In our experience, customers choosing the SaaS option were up and running quicker at a lower cost than our customers who chose to build their systems.

As we close 2008 and look forward to 2009, we want to especially thank our customers for their support and the interesting problems they have invited us to help solve.

Friday, December 12, 2008

Papers, Papers, and more Papers

Next Gen Sequencing is hot, hot, hot! You can tell by the numbers and frequency in which papers are being published.

A few posts ago, I wrote about a couple of grant proposals that we were preparing on methods to detect rare variants in cancer and improve the tools and methods to validate datasets from quantitative assays that utilize Next Gen data, like RNA-Seq, ChIP-Seq, or Other-Seq experiments. Besides the normal challenges of getting two proposals written and uploaded to the NIH, there was an additional challenge. Nearly everyday, we opened the tables-of-contents in our e-mail and found a new papers highlighting Next Gen Sequencing techniques, applications, or biological discoveries made through Next Gen techniques. To date, over 200 Next Gen publications have been produced. During the last two months alone more than 30 papers have been published. Some of these (listed in the figure below) were relevant to the proposals we were drafting.

The papers highlighted many of the themes we've touched on here, including the advantages of Next Gen sequencing and challenges with dealing with the data. As we are learning, these technologies allow us to explore the genome and genomics of systems biology at significantly higher resolutions than previously imagined. In one of the higher profile efforts, teams at the Washington University School of Medical and Genome Center compared a leukemia genome to a normal genome using cells from the same patient. This first intra-person whole genome analysis identified acquired mutations in ten genes, eight of which were new. Interestingly, the eight genes have unknown functions and might be important some day for new therapies.

Next Gen technologies are also confirming that molecular biology is more complicated than we thought. For example, the four most recent papers in Science show us that not only is 90% of the genome actively transcribed, but many genes have both sense and anti-sense RNA expressed. It is speculated that the anti-sense transcripts have a role in regulating gene expression. Also, we are seeing that nearly every gene produces alternatively spiced transcripts. The most recent papers indicate that between 92% and 97% of transcripts are alternatively spliced. My guess is that the only genes, not alternatively spliced are those lacking introns, like olfactory receptors. Although, when alternative transcription starts and alternative polyadenylation sites are considered, we may see that all genes are processed in multiple ways. It will be interesting to see how the products of alternative splicing and anti-sense transcription might interact.

This work has a number of take home messages.
  1. Like astronomy, when we can see deeper we see more. Next Gen technologies are giving us the means to interrogate large collections of individual RNA or DNA molecules and speculate more on functional consequences.
  2. Our limits are our imaginations. The reported experiments have used a variety of creative approaches to study genomic variation, sample expressed molecules from different strands of DNA, and measure protein DNA/RNA interaction.
  3. Good hands do good science. As pointed out in the paper from the Sanger Center on their implementation of Next Gen sequencing, the processes are complex and technically demanding. You need to have good laboratory practices with strong informatics support for all phases (laboratory, data management, and data analysis) of the Next Gen sequencing processes.
The final point is very important and Geospiza’s lab management and data analysis products will simplify your efforts in getting Next Gen systems running to make your major investment pay off and quickly publish results.

To see how, join us for a webinar next Wednesday, Dec. 17 at 10 am PDT, for RNA Expression Analysis with Geospiza.


Click on the figure to enlarge the text.

Wednesday, December 10, 2008

Sneak Peak: RNA Expression Analysis with Geospiza

Next Generation DNA sequencing is revolutionizing transcriptome analysis and giving us much deeper insights into the ways in which genes are expressed. Next Wednesday, December 17th, Geospiza will host a webinar on how FinchLab and GeneSifter simplify complex data analyses to turn millions of reads into informative datasets that can yield scientific insights.

Next Gen sequencing is quickly becoming an attractive option for gene expression analysis because the vast numbers of sequences that can be obtained provide a highly sensitive way to evaluate the RNA population inside of a cell. In addition to rRNA, tRNA, and mRNA, new assays are quickly emerging to measuring non-coding RNA and multiple classes of small RNAs as well. Moreover, as we obtain deeper information, largely through Next Gen, we learn that even mRNA is more complicated than previously thought. In yeast, 85% of the genome might be transcribed and new reports indicate that 92-97% of human genes undergo alternative splicing.

Next Gen sequencing applications such as RNA-Seq, Tag Profiling, and Small RNA Analysis allow whole genome analysis of coding as well as non-coding RNA at an unprecedented level. Current technologies can generate 200 million data points in a single instrument and can completely characterize all known RNAs in a sample, and identify novel RNAs and novel splicing events for known RNAs.

Join us next Wed. Dec 17 at 10:00 am (PDT) as we provide an overview of two applications, RNA-Seq and miRNA-Seq, using examples from publicly available datasets. The presentation will include a discussion of the challenges and solutions for how sequence data from the transcriptome can be analyzed in routine ways with Geospiza’s products.

Register Now!

Further reading

Castle J.C., Zhang C., Shah J.K., Kulkarni A.V., Kalsotra A., Cooper T.A., Johnson J.M., 2008. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet 40, 1416-1425.

David L., Huber W., Granovskaia M., Toedling J., Palm C.J., Bofkin L., Jones T., Davis R.W., Steinmetz L.M., 2006. A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci U S A 103, 5320-5325.

Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B., 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621-628.

Seila A.C., Calabrese J.M., Levine S.S., Yeo G.W., Rahl P.B., Flynn R.A., Young R.A., Sharp P.A., 2008. Divergent Transcription from Active Promoters. Science.

Wang E.T., Sandberg R., Luo S., Khrebtukova I., Zhang L., Mayr C., Kingsmore S.F., Schroth G.P., Burge C.B., 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476.

Wold B., Myers R.M., 2008. Sequence census methods for functional genomics. Nat Methods 5, 19-21.

Zamore P.D., Haley B., 2005. Ribo-gnome: the big world of small RNAs. Science 309, 1519-1524.

Tuesday, December 2, 2008

ABRF 2009 is just around the corner

Karen Jonscher said it best, “Reminder - register now for the [ABRF] Satellite Educational Workshops!”

In her email to the ABRF email forum, Karen reminded us that the ABRF Education Committee is excited to present five new Satellite Educational Workshops at ABRF 2009 in Memphis Tennessee on genomics and proteomics technologies. Of course, I think the most exciting topic is Next Generation DNA Sequencing, subtitled "Massively Parallel Sequencers in the Core Facility: Applications and Computation."

The workshop will have a full day of presentations and discussions. The first part, “Platforms and Applications” will focus on the laboratory perspective of running these systems. We will have three presentations on what is it like to prepare and run samples as well as troubleshoot equipment and how to review quality control data.

The second part, “Computation and Analysis” will tackle heady issues of what to do with the massive amounts of data being produced. Presenters will provide information ranging from an overview of data analysis, to data management infrastructures, to discovery based analysis for SNP and biomarker discovery.

During the day there will be time to meet and speak with the presenters as well as representatives from sponsoring companies. It will be good.

Both general information about all of the workshops and specific information about the next generation sequencing workshop are posted at the ABRF site. Don't wait, you might miss a great opportunity.

Hope to see you in Memphis.