##### Department of Mathematics,

University of California San Diego

****************************

### Special Colloquium

## Michael G. Schimek

#### Medical University of Graz, Institute for Medical Informatics, Statistics & Documentation

## The support vector machine as a statistical learning tool

##### Abstract:

The main objective of statistical learning is the characterization of an unknown dependency between observations (measurements) on observational units and certain properties of these observational units. All the measurements are assumed to be observable. The dependent properties are only available for a subset (i.e. the learning set) of observation units. The Support Vector Machine (SVM), a machine learning concept, has recently attracted increasing attention in the statistics community. One of the reasons is, that it can handle the situation of far more variables than observational units as it is now common in bioinformatics applications. The general problem of estimating unknown dependencies and to use them for prediction occurs in many other applications such as environmental sciences, quantitative economics, and finance. The SVM (V. N. Vapnik: The nature of statistical learning theory. Springer, NY, 2000) is a classification method specially suitable for overlapping classes. It produces nonlinear boundaries by constructing a linear boundary in a transformed version of the feature space. We discuss the SVM in the context of statistics under unspecified stochastic assumptions. For instance complexity control can be achieved in a similar way as in penalized (regularized) binary regression. Apart from the classical (machine learning) algorithms statistical estimation concepts are considered.

Host: Aurore Delaigle

### May 22, 2006

### 4:00 PM

### AP&M 6218

****************************