##### Department of Mathematics,

University of California San Diego

****************************

### Bioinformatics

## Haiyan Huang

#### Harvard University

## Statistical methods for identifying transcription factor binding sites

##### Abstract:

The completion of the genomes of model organisms represents just the beginning of a long march toward in-depth understanding of biological systems. One challenge in post-genomic research is the detection of functional patterns from full-length genomic sequences. This talk focuses on statistical methods in finding patterns with functional or structural importance in biological sequences, in particular the identification of transcription factor binding sites (TFBSs). Some of the underlying mathematical theories will be discussed as well.TFBSs are often short and degenerate in sequence. Therefore they are often described by position- specific score matrices (PSSMs), which are used to score candidate TFBSs for their similarities to known binding sites. The similarity scores generated by PSSMs are essential to the computational prediction of single TFBSs or regulatory modules. We develop the Local Markov Method (LMM), which provides local p-values as a more reliable and rigorous alternative. Applying LMM to large-scale known human binding site sequences in situ, we show that compared to current popular methods, LMM can reduce false positive errors by more than 50% without compromising sensitivity.

Host: Ian Abramson

### March 11, 2003

### 12:00 PM

### AP&M 6438

****************************