UCSD MATHEMATICS DEPARTMENT: APPLICATIONS SEMINAR

APPLICATIONS SEMINAR
Noon, Monday, October 2, 2006, AP&M 6402
Dr. M. Vidyasagar, Executive Vice President Tata Consultancy Services Limited Hyderabad INDIA
Stochastic Modelling Methods for Gene Finding
In this talk, the problem of finding genes from the genome (DNA sequence) is formulated as a problem in stochastic modelling and classification. No prior knowledge of biology is assumed and the talk will be completely self-contained in this respect. A new classification algorithm, called Mixed Memory Markov Model (4M) algorithm, is presented, and its significance (probability of generating an incorrect classification) is analyzed using sound statistical principles. It is also shown that, on nearly 70 bacterial genomes, the 4M algorithm performs as well or better than the currently most popular algorithm, known as Glimmer-2. (But the emphasis of the talk is on the statistical aspects.)