Abstract
The paper surveys the literature on the bandit problem, focusing on its recent development in the presence of switching costs. Switching costs between arms makes not only the Gittins index policy suboptimal, but also renders the search for the optimal policy computationally infeasible. This survey will first discuss the decomposability properties of the arms that make the Gittins index policy optimal, and show how these properties break down upon the introduction of costs on switching arms. Having established the failure of the simple index policy, the survey focus on the recent efforts to overcome the difficulty of finding the optimal policy in the bandit problem with switching costs: characterization of the optimal policy, exact derivation of the optimal policy in the restricted environments, and lastly approximation of optimal policy. The advantages and disadvantages of the above approaches are discussed.
Original language | English |
---|---|
Pages (from-to) | 513-541 |
Number of pages | 29 |
Journal | De Economist |
Volume | 152 |
Issue number | 4 |
DOIs | |
Publication status | Published - Dec 2004 |
Bibliographical note
Funding Information:∗∗ I am indebted to Prajit K. Dutta for his guidance throughout this research. Comments from Richard Ericson, Rajiv Sethi, Lalith Munasinghe, and Levent Kockesen greatly helped me refine this paper. I also would like to thank participants of Micro Theory workshop at Columbia University. Financial support from the Kyung Hee Alumni is greatly acknowledged. All remaining errors are mine.
Keywords
- Decomposability
- Multi-armed bandits
- Switching costs