Tzu L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules, 1985

 Sequential choice from several populations

 MICHAEL N. KATEHAKIS AND HERBERT ROBBINS

 Rutgers University, New Brunswick, NJ 08903

 Contributed by Herbert Robbins, May 4, 1995

 ABSTRACT We consider the problem of sampling sequentially

 from two or more populations in such a way as to

 maximize the expected sum of outcomes in the long run.

Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem

 Vol. 27, No. 4 (Dec., 1995), pp. 1054-1078

Jouini, W., Ernst, D., Moy, C. and Palicot, J., 2010, May. Upper confidence bound based decision making strategies and dynamic spectrum access. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE.

 Bound (UCB) algorithms could be useful to design decision

 making strategies for SUs to exploit intelligently the spectrum

 resources based on their past observations.