COMPUTING WITH TREES

Susan Holmes, Stanford University

(joint work with Persi Diaconis)

This work presents a natural coordinate system for `trees' using a correspondence with the set of perfect matchings in the complete graph. This correspondence produces a distance between trees, and a way of enumerating all trees in a minimal step order. It is useful in randomized algorithms as it enables moves on the space of trees that make random optimization strategies `mix' quickly. It also promises a generalization to intermediary trees when data are not decisive as to their choice of tree, and a new way of constructing Bayesian priors on tree space.

This representation provides a nice way of implementing genetic algorithms for approximately optimizing criteria such as parsimony and maximum likelihood, known to be NP complete.

This work applies to the space of all phylogenetic trees for a given set of species, it also extends to the space of all decision trees of a certain complexity measured in terms of the number of observations and to the analysis of DNA micro-arrays.

Susan Holmes (Stanford University)

The main idea behind this project is to unify the presentation of probability to a heterogeneous audience through the interest we have in things that surprise us. Some examples we use in our probability classes include: 'the birthday problem', 'say red', 'Russia roulette', 'de Mere's problem', 'Monty Hall'.

The tools developed are based on discoveries by cognitive psychologists (in particular Tversky and Kanneman) over the last 20 years, that have not, as yet, been used in teaching probability in this country.

The programs include simulation programs to give students a feel for probability, animated scenarios to help motivate and amuse them, as well as to make the material more memorable.

The audience of high school teachers with no training in probability is especially addressed, with all the class notes complementary to the applets available online.

Susan Holmes (Stanford University)

This talk presents a geometric model of a space which parameterizes phylogenetic trees using both combinatorics and geometry. The geometry of the space gives a way of measuring a distance between phylogenetic trees as well as a way of `averaging' or `combining' several trees whose leaves are identical. We can also provide the convex hull for a set of trees, and thus give trees that have highest depth, this is also an equivalent for finding the most 'central tree' to a set of trees in some sense.

This geometric model of tree space provides answers to questions on the number of `neighbors' to a given tree and the `curvature' of the boundaries between regions defining different trees that have been posed by biologists and statisticians over the last decade. It also provides a justification for disregarding portions of the trees that agree, and thus simplifying the space in which comparisons are to be made.