FinchTalk: February 2010

Tuesday, February 23, 2010

GeneSifter Lab Edition v3.14 - Release Notes

GeneSifter Laboratory Edition (GSLE) 3.14.0 introduces a host of new features and capabilities that make daily laboratory data management work even easier. Read below to learn why GSLE is a leading LIMS product for all forms of DNA sequencing, microarrays, and other genetic analysis applications.

Orders and Invoices

Multi plate submissions: Order forms have been extended in several ways to further simplify how labs collect sample and project information. A new order form template lets core facilities, managing larger sequencing projects, easily receive samples and their information in a multiple plate format. New order fields specific to the plate format are included to support sample tracking and lab work.

Add data to fields: Orders forms have been further improved by adding the ability to add new values (or terms) to dropdown fields that already exist on published order forms.

 Project field: Additionally, labs can add an optional project field to forms. With these improvements, labs can create forms that are easier to use and modify, as well as enable project tracking for their customers.

Sample location and sample selection: Two new features deliver help for labs that provide sample storage (biobanking) services to their clients. First, order forms can include sample location information. This is particularly useful in situations where samples are delivered in 96-well plates that are stored for later use. Second, samples already stored by the lab as purified DNA, RNA or other material (templates) can be selected from specialized search interfaces within order forms. Like all GSLE sample entry forms, these features can be included or not on a case-by-case basis depending on your specific needs.

Invoice formatting: For labs that have the dreaded chore of sending billing data to accounting departments we have added the ability to modify the invoice number format to include additional characters that are used to distinguish which labs are sending information.

Laboratory Operations

GSLE provides the ability to create, list and follow steps in sample protocols (also called workflows). In 3.14 new features not only expand the capabilities but make it possible to further standardize procedures.  

Multiplexing: In Next Generation Sequencing (NGS) several libraries are often combined into a single lane or region of a slide to increase the number of individual samples analyzed in a sequencing run. As each library is prepared, a specific adaptor sequence is added so sequence reads corresponding to different samples can be identified by their adaptor tag. This procedure, called multiplexing or barcoding, is supported in 3.14 and allows the lab to combine samples and adaptor sequences and group the combination of libraries together (Worksets) for sample processing and instrument runs. Once data are collected, sample naming conventions, combined with adaptor sequence (Multiplex Identifier, MID) stored in sample sheets, are used to separate individual reads into files corresponding to the samples that were in the original workset.

Batch data entry: Some lab processes require that samples are manipulated in groups (batches), but laboratory data are collected for individual samples within the batch. For example, the concentrations of individual DNA samples may need to be measured in a 96-well plate. To improve how the OD values, comments, or other information are entered, workflow steps have been updated to include batch data entry forms that provide spreadsheet like data entry capabilities. Like all GSLE batch data entry forms, data can be entered easily using the form’s column highlight and easy fill controls, or uploaded from an excel spreadsheet.

Subsample processing: GSLE 3.14 also increases sample processing flexibility. As noted above, order forms can now support the ability to select samples that are already stored in the system. This feature is further extended into the laboratory by creating tools that allow many new samples to be created from a “parent” or stock samples. When new samples (templates) are created, options are provided so that each new sample can be entered into a different process. For example, you receive a tissue sample that needs several experiments performed; RNA-Seq, ChIP-Seq and resequencing. Now you can easily pick the sample and create three new sub samples defining which process will be performed on each sample with just a few clicks.

Selecting samples based on custom data: Some labs need to use custom data entered into order forms to sort and filter samples in the lab. For example, an order form may ask a researcher to enter read lengths for their NGS run. A 36 base run is much faster than a 100 base run, and on some platforms costs less. Thus, the lab will sort samples based on read length prior to the data collection event. While always possible to get this information in many GSLE displays, 3.14 adds new capabilities to use any custom data in its specialized sample picker tools.

Other Features

Customer data management: GSLE v3.14 gives labs’ customers increased ability to organize their chromatograms, fragment analysis files and microarray files as needed. Data files can be edited, relabeled, moved or deleted. Projects and folders can be created, modified or deleted to aid in data organization.

Application Programming Interface (Onsite Installations Only)

SQL-API: As automation and system integration needs increase, requirements for supporting programmatic data entry become more important. GSLE has continued to expand the self-documenting Application Programming Interface (API). We have also added an SQL API that can be used to create custom reports that are accessed via a wget style unix command.

 Input API enhancements: The Input API now returns success IDs and CGI parameter names have been eliminated. The full documentation can be reviewed by contacting support@geospiza.com for the GSLE SQL API Manual or the GSLE Input API Manual.  

Next Generation Analysis Transfer Tool (Hosted Partners Only)

Simplified data transfers: A data transfer interface has been added to connect GSLE and GeneSifter Analysis Edition (GSAE). Partner Program administrators use the interface to select data files in GSLE and transfer them to their customer’s account in GSAE.

Schema Table update note

There was an update to an existing schema table; the column "Plate_Label" is now in table om_sample_plate instead of om_order.

Wednesday, February 17, 2010

Standardizing the Next Generation of Bioinformatics Software Development With BioHDF (HDF5)

AGBT is next week, and well be there presenting a poster on our latest and greatest work with HDF5 and BioHDF tools. For those of you attending, check out the poster. For those unable to attend, check back later for the "Bloginar."

Abstract

Next Generation Sequencing technologies are powerful tools for rapidly sequencing genomes and studying functional genomics. However, the lack of scalable data analysis capabilities limits their potential. Future bioinformatics applications need to be developed on common standard infrastructures that can reduce overall data storage, increase data processing performance, integrate information from multiple sources and are self-describing. HDF technologies meet all of these requirements, have a long history, and are widely used in data-intensive science communities. They consist of general data file formats, software libraries and tools for manipulating the data. Compared to emerging standards such as the SAM/BAM formats, HDF5-based systems demonstrate improved I/O performance and methods to reduce data storage. HDF5 is also more extensible and can support multiple data indexes and store multiple data types. For these reasons, HDF5 and its BioHDF implementation are well qualified as standards for implementing data models in binary formats to support the next generation of bioinformatics applications.

In the poster we will present:

An overview of NGS data analysis and workflows
A prototype data model for working with NGS data
Practical examples of data analysis and viewing information using the underlying framework
Performance benchmarks comparing HDF5 to other file formats

Wednesday, February 3, 2010

Sneak Peak: Data Analysis Methods for Whole Transcriptome Sequencing Applications – Challenges and Solutions

RNA sequencing is one of the most popular Next Generation Sequencing (NGS) applications. Next Thursday, February 11 at 10:00 A.M. PDT (1:00 P.M. EDT), we kick off our 2010 webinar series with a presentation designed to help you understand whole transcriptome data analysis and what can be learned in these experiments. In addition, we will show off some of our latest tools and interfaces that can be used to discover new RNAs, new splice forms of transcripts, and alleles of expressed genes.

Summary

RNA sequencing applications such as Whole Transcriptome Analysis, Tag Profiling and Small RNA Analysis allow whole genome analysis of coding as well as non-coding RNA at an unprecedented level. Current technologies allow for the generation of 500 million data points in a single instrument run. In addition to allowing for the complete characterization of all known RNAs in a sample (gene level expression summaries, exon usage, splice junction, single nucleotide variants, insertions and deletions), these applications are also ideal for the identification of novel RNAs as well as novel splicing events.

This presentation will provide an overview of Whole Transcriptome data analysis workflows with emphasis on calculating gene and exon level expression values as well as identifying splice junctions and variants from short read data. Comparisons of multiple groups to identify differential gene expression as well as differential splicing will also be discussed. Using data drawn from the GEO data repository and Short Read Archive (SRA), analysis examples will be presented for both Illumina’s GA and Lifetech’s SOLiD instruments.

Register Today!