Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.

Autor(es): Lomsadze Alexandre; Burns Paul D; Borodovsky Mark

Resumo: We present a new approach to automatic training of a eukaryotic ab initio gene finding algorithm. With the advent of Next-Generation Sequencing, automatic training has become paramount, allowing genome annotation pipelines to keep pace with the speed of genome sequencing. Earlier we developed GeneMark-ES, currently the only gene finding algorithm for eukaryotic genomes that performs automatic training in unsupervised ab initio mode. The new algorithm, GeneMark-ET augments GeneMark-ES with a novel method that integrates RNA-Seq read alignments into the self-training procedure. Use of 'assembled' RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments. We demonstrated in computational experiments that the proposed method of incorporation of 'unassembled' RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%. In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.

Palavras-Chave: Computational methods; Massively parallel (Deep) sequencing; Genomics

Imprenta: Nucleic Acids Research, v. 42, n. 15, p. e119, 2014

Identificador do objeto digital: 10.1093/nar/gku557

Descritores: Aedes aegypti - Genome ; Aedes aegypti - Pathogenesis ; Aedes aegypti - RNA

Data de publicação: 2014