Monday, February 23, 2009

Three Themes from AGBT and ABRF Part III: The IT Problem

The power of Next Generation DNA Sequencing (NGS) technology come from the fact that a massive amount of data, sampling millions of individual molecules, is collected in a massively parallel format. This power also limits the potential wide-spread adoption of the technology because of the IT (Information Technology) challenges that result from the massive amount of data created with each sequencer run.

IT challenges form the third technical theme from the AGBT and ABRF conferences. The previous two posts underscored the need for good laboratory practices and rich bioinformatics support to make NGS experiments successful. This post discusses the experiences communicated by the early adopters of NGS technology with respect to the computing infrastructure.

Surprises

Throughout the literature and NGS presentations, the data management issues created by NGS play a central role. Recent editorials in Nature Methods [1] and Nature Biotechnology [2] speak to the problem and express researchers' frustrations in dealing with the lack of IT infrastructures. At the ABRF workshop, we had two presentations specifically focused on the IT challenges, describing two different experiences.

In the first case, the group implementing NGS had a number of surprises after the NGS system was installed and running. They learned that these systems not only require a lot of storage and computing support, they also use up a lot of bandwidth when data are transferred. The bandwidth problem led to the need for a revised network architecture to isolate the NGS data flow from other network activity.

This talk brought similar surprises to mind. In other labs, NGS “surprises” have led to groups needing to upgrade server rooms by installing backup power, air conditioning, and other equipment. Of course these surprises are manageable if you have an IT group and a server room in the first place. In some cases, groups start with even less and find that the IT costs makes the NGS endeavor very expensive. Even with support and space the IT costs for bringing in NGS can quickly grow into six figures (above $100,000) for infrastructure alone.

The second presentation was given by a group who was well prepared for NGS. Their university had made a previous commitment to building an IT infrastructure to support data intensive genomics research, so adding NGS was a step up in their view. Their experience allowed them to develop a strong implementation plan that called for a number of systems upgrades that included upgrading network hardware. While total costs were less than the six figure surprises others experienced, they did spend many tens of thousands of dollars on new file servers, CPUs, network switches, and server room upgrades.

The conclusion from both of the presentations was that if you are going to set up an NGS infrastructure three things are important: planning, planning, planning. Also, institutional support is critically important since renovations and new building may need to ramp up too. Personnel with network, systems administration, and unix experience are also essential. Finally, as the second speaker put it, you need to encourage researchers to invest in the infrastructure. If they are not involved in the process and contributing time and money, the endeavor can quickly fail.

These talks bring me to my favorite marketing slogan where one of Illumina’s customers put an NGS instrument in their mail room. Whenever I hear that, or see the ad, it makes me think, “yes, you can turn a mail room into a genome center, but where will you put the data center?

There is a solution


For those thinking about NGS technology, or running an NGS experiment where the samples are submitted to a lab, and the data returned, even contemplating the IT requirements can be discouraging. But, it does not have to be this way. Over the past ten years, an immense infrastructure of data centers has emerged . Today, there are many options and price points available for storage, computing, and backup systems. Groups can save significant time and money using on-line services because costs scale with need. Moreover, on-line services eliminate the need for dedicated systems and data administrators putting more money in the budget for experiments. You have a choice. Jump in and do some interesting science or work hard to have your campus facilities remodeled.

Geospiza is taking advantage of the Internet’s infrastructure to offer our clients cost effective ways to get NGS running in their lab. GeneSifter Laboratory Edition can be delivered through a SaaS (Software as a Service) model to get labs up and running quickly. Just sign up, get access, and you are ready to go. GeneSifter Analysis Edition solves the IT problem for research groups who get their sequencing done through core labs or other service providers. In these cases, you upload you data and with a few clicks, process your data and analyze the results. Because the infrastructure is built, overall costs for IT and bioinformatics are much lower, and you do not have to experience a remodeling project.

References
1. 2008. Byte-ing off more than you can chew. Nat Methods 5, 577.
2. 2008. Prepare for the deluge. Nat Biotechnol 26, 1099.

No comments: