##### Department of Mathematics,

University of California San Diego

****************************

### Applications Seminar

## M. Vidyasagar

#### Executive Vice President \\ Tata Consultancy Services Limited \\ Hyderabad INDIA

## Stochastic modelling methods for gene finding

##### Abstract:

In this talk, the problem of finding genes from the genome (DNA sequence) is formulated as a problem in stochastic modelling and classification. No prior knowledge of biology is assumed and the talk will be completely self-contained in this respect. A new classification algorithm, called Mixed Memory Markov Model (4M) algorithm, is presented, and its significance (probability of generating an incorrect classification) is analyzed using sound statistical principles. It is also shown that, on nearly 70 bacterial genomes, the 4M algorithm performs as well or better than the currently most popular algorithm, known as Glimmer-2. (But the emphasis of the talk is on the statistical aspects.)

Hosts: Bill Helton and Ruth Williams

### October 2, 2006

### 12:00 PM

### AP&M 6402

****************************