##### Department of Mathematics,

University of California San Diego

****************************

### Math 288 - Stochastic Systems Seminar

## Angela Yu

#### UCSD

## Three wrongs make a right: reward underestimation mitigates idiosyncrasies in human bandit behavior

##### Abstract:

Combining a multi-armed bandit task and Bayesian computational modeling, we find that humans systematically under-estimate reward availability in the environment. This apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision-making under uncertainty, such as a default tendency to assume non-stationarity in environmental statistics as well as the adoption of a simplistic decision policy. In particular, reward rate underestimation discourages the decision-maker from switching away from a ``good'' option, thus achieving near-optimal behavior (which never switches away after a win). Furthermore, we demonstrate that the Bayesian model that best predicts human behavior is equivalent to a particular class of reinforcement learning models, thus giving statistical, normative grounding to phenomenological models of human behavior.

Host: Ruth Williams

### January 23, 2020

### 2:00 PM

### AP&M 7218

****************************