The pace at which genome references are being generated for plants and animal species is rapidly increasing with Next Generation Sequencing technologies. While this is a major step forward for researchers studying species that previously did not have sequenced genomes, it is only the beginning of the process toward defining the biology underlying the genome. As long as a reference is available, DNA variants can be readily identified on a genome wide scale, often producing lists of 100s of thousands or even millions of variants. Frequently these variants that occur in expressed genes are of the most interest; however, if annotation defining where genes exist within a genome is not available or poorly defined, identifying which mutations might affect protein coding may not be possible. To address this challenge we will describe a method whereby RNA-Seq can be readily used to identify transcriptionally active regions which creates transcript annotation for un-annotated or enhanced annotation for any organism. This annotation can then be used in conjunction with whole genome sequencing to annotate variants as to whether they fall within transcriptionally active regions thus facilitating the identification of mutations in larger repertoire of expressed regions of a genome.
Eric Olson, Ph.D., and Hugh Arnold, Ph.D., from Geospiza will present.