Record Details

Coding sequence density estimation via topological pressure

ScholarsArchive at Oregon State University

Field Value
Title Coding sequence density estimation via topological pressure
Names Koslicki, David (creator)
Thompson, Daniel J. (creator)
Date Issued 2015-01 (iso8601)
Note This is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Springer and can be found at: http://link.springer.com/journal/285
Abstract We give a new approach to coding sequence (CDS) density
estimation in genomic analysis based on the topological pressure, which
we develop from a well known concept in ergodic theory. Topological
pressure measures the ‘weighted information content’ of a finite word,
and incorporates 64 parameters which can be interpreted as a choice
of weight for each nucleotide triplet. We train the parameters so that
the topological pressure fits the observed coding sequence density on
the human genome, and use this to give ab initio predictions of CDS
density over windows of size around 66,000bp on the genomes of Mus
Musculus, Rhesus Macaque and Drososphilia Melanogaster. While the
differences between these genomes are too great to expect that training
on the human genome could predict, for example, the exact locations of
genes, we demonstrate that our method gives reasonable estimates for
the ‘coarse scale’ problem of predicting CDS density.
Inspired again by ergodic theory, the weightings of the nucleotide
triplets obtained from our training procedure are used to define a probability
distribution on finite sequences, which can be used to distinguish
between intron and exon sequences from the human genome of lengths
between 750bp and 5,000bp. At the end of the paper, we explain the
theoretical underpinning for our approach, which is the theory of Thermodynamic
Formalism from the dynamical systems literature. Mathematica
and MATLAB implementations of our method are available at
http://sourceforge.net/projects/topologicalpres/.
Genre Article
Topic DNA sequence analysis
Identifier Koslicki, D., & Thompson, D. J. (2015). Coding sequence density estimation via topological pressure. Journal of Mathematical Biology, 70(1-2), 45-69. doi:10.1007/s00285-014-0754-2

© Western Waters Digital Library - GWLA member projects - Designed by the J. Willard Marriott Library - Hosted by Oregon State University Libraries and Press