generated at
UCB1
1985
Tzu L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules, 1985

1995
>Proc. Natl. Acad. Sci. USA
> Vol. 92, pp. 8584-8585, September 1995
> Statistics
> Sequential choice from several populations
> MICHAEL N. KATEHAKIS AND HERBERT ROBBINS
> Rutgers University, New Brunswick, NJ 08903
> Contributed by Herbert Robbins, May 4, 1995
> ABSTRACT We consider the problem of sampling sequentially
> from two or more populations in such a way as to
> maximize the expected sum of outcomes in the long run.

>Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem
> Rajeev Agrawal
> Advances in Applied Probability
> Vol. 27, No. 4 (Dec., 1995), pp. 1054-1078

2010
>Jouini, W., Ernst, D., Moy, C. and Palicot, J., 2010, May. Upper confidence bound based decision making strategies and dynamic spectrum access. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE.
>We suggest that Upper Confidence
> Bound (UCB) algorithms could be useful to design decision
> making strategies for SUs to exploit intelligently the spectrum
> resources based on their past observations.