FinchTalk: Microarray

Showing posts with label Microarray. Show all posts

Thursday, July 12, 2012

Resources for Personalized Medicine Need Work

Yesterday (July 11, 2012), PLoS ONE published an article prepared by my colleagues and myself entitled "Limitations of the Reference Genome for Personalized Genomics."

This work, supported by Geospiza's SBIR targeting ways to improve mutation detection and annotation, explored some the resources and assumptions that are used to measure and understand sequence variation. As we know, a key deliverable of the human genome project was to produce a high quality reference sequence that could be used to annotate genes, develop research tools like genotyping and microarray assays, and provide insights to guide software development. Projects like HapMap used these resources to provide additional understandings in terms of genetic linkage in populations.

Decreasing sequencing costs

Since those early projects, DNA sequencing costs have plummeted. As a result, endeavors such as the 1000 Genomes Project (1KGP) and public contributions from Complete Genomics (CG) have dramatically increased the number of known sequence variants. A question worth asking is how do these new data contribute to an understanding of the utility of current resources and assumptions that have guided genomics and genetics for the past six or seven years?

Number of variants by dbSNP build

To address the above question, we evaluated several assay and software tools that were based on the human genome reference sequence in the context of new data contributed by 1KGP and CG. We found a high frequency of confounding issues with microarrays, and many cases where invalid assumptions, encoded in bioinformatics programs, underestimate variability or possibly misidentify the functional effects of mutations. For example, 34% of published array-based GWAS studies for a variety of diseases utilize probes that contain undocumented variation or map to regions of previously unknown structural variation. Similarly, assumptions about the size of linkage disequillibrium decrease as the numbers of variants increase.

The significance of this work is that it documents what many are anecdotally experiencing. As we continue to learn about the contributing role of rare variation in human disease we need to fully understand how current resources can be used and work to resolve discrepancies in order to create an era of personalized medicine.

(2012). Limitations of the Human Reference Genome for Personalized Genomics, PLoS ONE, DOI: 10.1371/journal.pone.0040294.t002

Wednesday, November 3, 2010

Samples to Knowledge

Today Geospiza and Ingenuity announced a collaboration to integrate our respective GeneSifter Analysis Edition (GSAE) and Ingenuity Pathway Analysis (IPA) software systems.

Why is this important?

Geospiza has always been committed to providing our customers the most complete software systems for genetic analysis. Our LIMS [GeneSifter Laboratory Edition] and GSAE have worked together to form a comprehensive samples to results platform. From core labs, to individual research groups, to large scale sequencing centers, GSLE is used for collecting sample information, tracking sample processing, and organizing the resulting DNA sequences, microarray files, and other data. Advanced quality reports keep projects on track and within budget.

For many years, GSAE has provided a robust and scalable way to scientifically analyze the data collected for many samples. Complex datasets are reduced and normalized to produce quantitative values that can be compared between samples and within groups of samples. Additionally, GSAE has integrated open-source tools like Gene Ontologies and KEGG pathways to explore the biology associated with lists of differentially expressed genes. In the case of Next Generation Sequencing, GSAE has had the most comprehensive and integrated support for the entire data analysis workflow from basic quality assessment to sequence alignment and comparative analysis.

With Ingenuity we will be able to take data-driven biology exploration to a whole new level. The IPA system is a leading platform for discovering pathways and finding the relevant literature associated with genes and lists of genes that show differential expression in microarray analysis. Ingenuity's approach focuses on combining software curation with expert review to create a state-of-the-art system that gets scientists to actionable information more quickly than conventional methods.

Through this collaboration two leading companies will be working together to extend their support for NGS applications. GeneSifter's pathway analysis capabilities will increase and IPA's support will extend to NGS. Our customers will benefit by having access to the most advanced tools for turning vast amounts of data into biologically meaningful results to derive new knowledge.

Samples to Results^TM becomes Samples to Knowledge^TM

Monday, May 17, 2010

Journal Club: GeneSifter Aids Stem Cell Research

Last week’s Nature featured an article entitled “Aberrant silencing of imprinted genes on chromosome 12qF1 in mouse induced pluripotent stem cells [1]” in which GeneSifter Analysis Edition (GSAE) was used to compare gene expression between genetically identical mouse embryonic stem (ES) cells and induced pluripotent stem cells (iPSCs).

Stem Cells

Stems cells are undifferentiated, pluripotent, cells that later develop into the specialized cells of tissues and organs. Pluripotent cells can divide essentially without limit, become any kind of cell, and have been found to naturally repair certain tissues. They are the focus of research because of their potential for treating diseases that damage tissues. Initially stem cells were isolated from embryonic tissues. However, with human cells, this approach is controversial. In 2006 researchers developed ways to “reprogram” somatic cells to become pluripotent cells [2]. In addition to being less controversial, iPSCs have other advantages, but there are open questions as to their therapeutic safety due to potential artifacts introduced during the reprogramming process.

Reprogramming cells to become iPSCs involves the overexpression of a select set of transcription factors by viral transfection, DNA transformation, and other methods. To better understand what happens during reprogramming, researchers have examined gene expression and DNA methylation patterns between ES cells and iPSCs and have noted major differences in mRNA and microRNA expression as well as DNA methylation patterns. As noted in the paper, a problem with previous studies is that they compared cells with different genetic backgrounds. That is, the iPSCs harbor viral transgenes that are not present in the ES cells, and the observed differences could likely be due to factors unrelated to reprogramming. Thus, a goal of this paper's research was to compare genetically identical cells to pinpoint the exact mechanisms of reprogramming.

GeneSifter in Action

Comparing genetically similar cells requires that both ES cells and iPSCs have the same transgenes. To accomplish this goal, Stadtfeld and coworkers devised a clever strategy whereby they created a novel inducible transgene cassette and introduced it into mouse ES cells. The modified ES cells were then used to generate cloned mice containing the inducible gene cassette in all of their cells. Somatic cells could be converted to iPSCs by adding the appropriate inducing agents to the tissue culture media.

Even though ES cells and iPSCs were genetically identical, ES cells were able to generate live mice whereas iPSCs could not. To understand why, the team looked at gene expression using microarrays. The mRNA profiles for six iPSC and four ES cell replicates were analyzed in GeneSifter. Unsupervised clustering showed that global gene expression was similar for all cells. When the iPSC and ES cell data were compared using correlation analysis, the scatter plot identified two differentially expressed transcripts corresponding to a non-coding RNA (Gtl2) and small nucleolar RNA (Rian). The transcripts’ genes map to the imprinted Dlk1-Dio3 gene cluster on mouse chromosome 12qF1. While these genes were strongly repressed in iPSC clones, the expression of housekeeping and pluripotentency cells was unaffected as demonstrated using GeneSifter’s expression heat maps.

Subsequent experiments that looked at gene expression from over 60 iPSC lines produced from different types of cells and chimeric mice that were produced from mixtures of iPSCs and stem cells showed that the gene silenced iPSCs had limited development potential. Because the Dlk3-Dio cluster imprinting is regulated by methylation, methylation patterns revealed that the Gtl2 allele had acquired an aberrant silent state in the iPSC clones. Finally, by knowing that Dlk3-Dio cluster imprinting is also regulated by histone acetylation, the authors were able to treat their iPSCs with a histone deacetylase inhibitor and produce live animals from the iPSCs. Producing live animals from iPSCs in a significant milestone for the field.

While histone deacetylase inhibitors have multiple effects, and more work will need to be done, the authors have completed a tour de force of work in this exciting field, and we are thrilled that our software could assist in this important study.

Further Reading

1. Stadtfeld M., Apostolou E., Akutsu H., Fukuda A., Follett P., Natesan S., Kono T., Shioda T., Hochedlinger K., 2010. "Aberrant silencing of imprinted genes on chromosome 12qF1 in mouse induced pluripotent stem cells." Nature 465, 175-181.

2. Takahashi K., Yamanaka S., 2006. "Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors." Cell 126, 663-676.

Stem Cell Basics: http://stemcells.nih.gov/info/basics

iPSCs: http://en.wikipedia.org/wiki/IPS_cells

Tuesday, February 23, 2010

GeneSifter Lab Edition v3.14 - Release Notes

GeneSifter Laboratory Edition (GSLE) 3.14.0 introduces a host of new features and capabilities that make daily laboratory data management work even easier. Read below to learn why GSLE is a leading LIMS product for all forms of DNA sequencing, microarrays, and other genetic analysis applications.

Orders and Invoices

Multi plate submissions: Order forms have been extended in several ways to further simplify how labs collect sample and project information. A new order form template lets core facilities, managing larger sequencing projects, easily receive samples and their information in a multiple plate format. New order fields specific to the plate format are included to support sample tracking and lab work.

Add data to fields: Orders forms have been further improved by adding the ability to add new values (or terms) to dropdown fields that already exist on published order forms.

 Project field: Additionally, labs can add an optional project field to forms. With these improvements, labs can create forms that are easier to use and modify, as well as enable project tracking for their customers.

Sample location and sample selection: Two new features deliver help for labs that provide sample storage (biobanking) services to their clients. First, order forms can include sample location information. This is particularly useful in situations where samples are delivered in 96-well plates that are stored for later use. Second, samples already stored by the lab as purified DNA, RNA or other material (templates) can be selected from specialized search interfaces within order forms. Like all GSLE sample entry forms, these features can be included or not on a case-by-case basis depending on your specific needs.

Invoice formatting: For labs that have the dreaded chore of sending billing data to accounting departments we have added the ability to modify the invoice number format to include additional characters that are used to distinguish which labs are sending information.

Laboratory Operations

GSLE provides the ability to create, list and follow steps in sample protocols (also called workflows). In 3.14 new features not only expand the capabilities but make it possible to further standardize procedures.  

Multiplexing: In Next Generation Sequencing (NGS) several libraries are often combined into a single lane or region of a slide to increase the number of individual samples analyzed in a sequencing run. As each library is prepared, a specific adaptor sequence is added so sequence reads corresponding to different samples can be identified by their adaptor tag. This procedure, called multiplexing or barcoding, is supported in 3.14 and allows the lab to combine samples and adaptor sequences and group the combination of libraries together (Worksets) for sample processing and instrument runs. Once data are collected, sample naming conventions, combined with adaptor sequence (Multiplex Identifier, MID) stored in sample sheets, are used to separate individual reads into files corresponding to the samples that were in the original workset.

Batch data entry: Some lab processes require that samples are manipulated in groups (batches), but laboratory data are collected for individual samples within the batch. For example, the concentrations of individual DNA samples may need to be measured in a 96-well plate. To improve how the OD values, comments, or other information are entered, workflow steps have been updated to include batch data entry forms that provide spreadsheet like data entry capabilities. Like all GSLE batch data entry forms, data can be entered easily using the form’s column highlight and easy fill controls, or uploaded from an excel spreadsheet.

Subsample processing: GSLE 3.14 also increases sample processing flexibility. As noted above, order forms can now support the ability to select samples that are already stored in the system. This feature is further extended into the laboratory by creating tools that allow many new samples to be created from a “parent” or stock samples. When new samples (templates) are created, options are provided so that each new sample can be entered into a different process. For example, you receive a tissue sample that needs several experiments performed; RNA-Seq, ChIP-Seq and resequencing. Now you can easily pick the sample and create three new sub samples defining which process will be performed on each sample with just a few clicks.

Selecting samples based on custom data: Some labs need to use custom data entered into order forms to sort and filter samples in the lab. For example, an order form may ask a researcher to enter read lengths for their NGS run. A 36 base run is much faster than a 100 base run, and on some platforms costs less. Thus, the lab will sort samples based on read length prior to the data collection event. While always possible to get this information in many GSLE displays, 3.14 adds new capabilities to use any custom data in its specialized sample picker tools.

Other Features

Customer data management: GSLE v3.14 gives labs’ customers increased ability to organize their chromatograms, fragment analysis files and microarray files as needed. Data files can be edited, relabeled, moved or deleted. Projects and folders can be created, modified or deleted to aid in data organization.

Application Programming Interface (Onsite Installations Only)

SQL-API: As automation and system integration needs increase, requirements for supporting programmatic data entry become more important. GSLE has continued to expand the self-documenting Application Programming Interface (API). We have also added an SQL API that can be used to create custom reports that are accessed via a wget style unix command.

 Input API enhancements: The Input API now returns success IDs and CGI parameter names have been eliminated. The full documentation can be reviewed by contacting support@geospiza.com for the GSLE SQL API Manual or the GSLE Input API Manual.  

Next Generation Analysis Transfer Tool (Hosted Partners Only)

Simplified data transfers: A data transfer interface has been added to connect GSLE and GeneSifter Analysis Edition (GSAE). Partner Program administrators use the interface to select data files in GSLE and transfer them to their customer’s account in GSAE.

Schema Table update note

There was an update to an existing schema table; the column "Plate_Label" is now in table om_sample_plate instead of om_order.

Sunday, November 8, 2009

Expeditiously Exponential: Data Sharing and Standardization

We can all agree that our ability to produce genomics and other kinds of data is increasing at exponential rates. Less clear, is understanding the consequences for how these data will be shared and ultimately used. These topics were explored in last month's (Oct. 9, 2009) policy forum feature in the journal Science.

The first article, listed under the category "megascience," dealt with issues about sharing 'omics data. The challenge being that systems biology research demands that data from many kinds of instrument platforms (DNA sequencing, mass spectrometry, flow cytometry, microscopy, and others) be combined in different ways to produce a complete picture of a biological system. Today, each platform generates its own kind of "big" data that, to be useful, must be computationally processed and transformed into standard outputs. Moreover, the data are often collected by different research groups focused on particular aspects of a common problem. Hence, the full utility of the data being produced can only be realized when the data are made open and shared throughout the scientific community. The article listed past efforts in developing sharing policies and the central table included 12 data sharing policies that are already in effect.

Sharing data solves half of the problem, the other aspect is being able to use the data once shared. This requires that data be structured and annotated in ways that make it understandable by a wide range of research groups. Such standards typically include minimum information check lists that define specific annotations, and which data should be kept from different platforms. The data and metadata are stored in structured documents that reflect a community's view about what is important to know with respect to how data were collected and the samples the data were collected from. The problem is that annotation standards are developed by diverse groups and, like the data, are expanding. This expansion creates new challenges with making data interoperable; the very problem standards try to address.

The article closed with high-level recommendations for enforcing policy through funding and publication requirements and acknowledged that full compliance requires that general concerns with pre-publication data use and patient information be addressed. More importantly, the article acknowledged that meeting data sharing and formatting standards has economic implications. That is, researches need time-efficient data management systems, the right kinds of tools and informatics expertise to meet standards. We also need to develop the right kind of global infrastructure to support data sharing.

Fortunately complying with data standards is an area where Geospiza can help. First, our software systems rely on open, scientifically valid tools and technologies. In DNA sequencing we support community developed alignment algorithms. The statistical analysis tools in GeneSifter Analysis Edition utilize R and BioConductor to compare gene expression data from both microarrays and DNA sequencing. Further, we participate in the community by contributing additional open-source tools and standards through efforts like the BioHDF project. Second, the GeneSifter Analysis and Laboratory platforms provide the time-effiecient data management solutions needed to move data through its complete life cycle from collection, to intermediate analysis, to publishing files in standard formats.

GeneSifter lowers researcher's economic barriers of meeting data sharing and annotation standards keep the focus on doing good science with the data.

Sunday, November 1, 2009

GeneSifter Laboratory Edition Update

GeneSifter Laboratory Edition has been updated to version 3.13. This release has many new features and improvements that further enhance its ability to support all forms of DNA sequencing and microarray sample processing and data collection.

Geospiza Products

Geospiza's two primary products, GeneSifter Laboratory Edition (GSLE) and GeneSifter Analysis Edition (GSAE), form a complete software system that supports many kinds of genomics and genetic analysis applications. GSLE is the LIMS (Laboratory Information Management System) that is used by core labs and service companies worldwide that offer DNA sequencing (Sanger and Next Generation), microarray analysis, fragment analysis and other forms of genotyping. GSAE is the analysis system researchers use to analyze their data and make discoveries. Both products are actively updated to keep current with latest science and technological advances.

The new release of GSLE helps labs share workflows, perform barcode-based searching, view new data reports, simplify invoicing, and automate data entry through a new API (application programming interface).

Sharing Workflows

GSLE laboratory workflows make it possible for labs to define and track their protocols and data that are collected when samples are processed. Each step in a protocol can be configured to collect any kind of data, like OD values, bead counts, gel images and comments, that are used to record sample quality. In earlier versions, protocols could be downloaded as PDF files that list the steps and their data. With 3.13, a complete workflow (steps, rules, custom data) can be downloaded as an XML file that can be uploaded into another GSLE system to recreate the entire protocol with just a few clicks. This feature simplifies protocol sharing and makes it possible for labs to test procedures in one system and add them to another when they are ready for production.

Barcode Searching and Sample Organization

Sometimes a lab needs to organize separate tubes in 96-well racks for sample preparation. Assigning each tube's rack location can be an arduous process. However, if the tubes are labeled with barcode identifiers, a bed scanner can be used to make the assignments. GSLE 3.13 provides an interface to upload bed scanner data and assign tube locations in a single step. Also, new search capabilities have been added to find orders in the system using sample or primer identifiers. For example, orders can be retrieved by scanning a barcode from a tube in the search interface.

 Reports and Data

Throughout GSLE, many details about data can be reviewed using predefined reports. In some cases, pages can be quite long, but only a portion of the report is interesting. GSLE now lets you collapse sections of report pages to focus on specific details. New download features have also been added to better support access to those very large NGS data files.

GSLE has always been good at identifying duplicate data in the system, but not always as good at letting you decide how duplicate data are managed. Managing duplicate data is now more flexible to better support situations where data need to be reanalyzed and reloaded.

The GSLE data model makes it possible to query the database using SQL. In 3.13, the view tables interface has been expanded so that the data stored in each table can be reviewed with a single click.

Invoices

Core lab's that send invoices will benefit from changes that make it possible to download many PDF formatted orders and invoices into a single zipped folder. Configurable automation capabilities have also been added to set invoice due dates and generate multiple invoices from a set of completed orders.

API Tools

As automation and system integration needs increase, external programs are used to enter data from other systems. GSLE 3.13 supports automated data entry through a novel self-documenting API. The API takes advantage of GSLE's built in data validation features that are used by the system's web-based forms. At each site, the API can be turned on and off by on-site administrators and its access can be limited to specific users. This way, all system transactions are easily tracked using existing GLSE logging capabilities. In addition to data validation and access control, the API is self-documenting. Each API containing form has a header that includes key codes, example documentation, and features to view and manually upload formatted data to test automation programs and help system integrators get their work done. GSLE 3.13 further supports enterprise environments with an improved API that is used to query external password authentication servers.

Wednesday, September 23, 2009

GeneSifter in Current Protocols

This month we are pleased to report Geospiza's publication of the first standard protocols for analyzing Next Generation Sequencing (NGS) data. The pulication, appearing in the September issue of Current Protocols, addresses how to analyze data from both microarray, and NGS experiments. The abstract and links to the paper and our press release are provided below.

Abstract

Transcription profiling with microarrays has become a standard procedure for comparing the levels of gene expression between pairs of samples, or multiple samples following different experimental treatments. New technologies, collectively known as next-generation DNA sequencing methods, are also starting to be used for transcriptome analysis. These technologies, with their low background, large capacity for data collection, and dynamic range, provide a powerful and complementary tool to the assays that formerly relied on microarrays. In this chapter, we describe two protocols for working with microarray data from pairs of samples and samples treated with multiple conditions, and discuss alternative protocols for carrying out similar analyses with next-generation DNA sequencing data from two different instrument platforms (Illumina GA and Applied Biosystems SOLiD).

In the chapter we cover the following protocols:

Basic Protocol 1: Comparing Gene Expression from Paired Sample Data Obtained from Microarray Experiments
Alternate Protocol 1: Compare Gene Expression from Paired Samples Obtained from Transcriptome Profiling Assays by Next-Generation DNA Sequencing
Basic Protocol 2: Comparing Gene Expression from Microarray Experiments with Multiple Conditions
Alternate Protocol 2: Compare Gene Expression from Next-Generation DNA Sequencing Data Obtained from Multiple Conditions

Links

To view the abstract, contents, figures, and literature cited online visit: Curr. Protoc. Bioinform. 27:7.14.1-7.14.34

To view the press release visit: Geospiza Team Publishes First Standard Protocol for Next Gen Data Analysis