HCT Algorithm¶

Introduction¶

Title: Online Stochastic Optimization under Correlated Bandit Feedback

Authors: Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

Abstract: In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X-armed bandit algorithm, and derive regret bounds matching the performance of state- of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. The main advantage of HCT is that it handles the challenging case of correlated bandit feedback (reward), whereas existing methods require rewards to be conditionally independent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.

Algorithm Parameters¶

nu (float) – parameter nu of the HCT algorithm

rho (float) – parameter rho of the HCT algorithm

c (float) – parameter c of the HCT algorithm

delta (float) – confidence parameter delta of the HCT algorithm

domain (list(list)) – The domain of the objective to be optimized

partition – The partition choice of the algorithm. Default: BinaryPartition.

Usage Example¶

from PyXAB.synthetic_obj.Garland import Garland
from PyXAB.algos.HCT import HCT

domain = [[0, 1]]               # Parameter is 1-D and between 0 and 1
target = Garland()
algo = HCT(domain=domain)

for t in range(1000):
    point = algo.pull(t)
    reward = target(point)
    algo.receive_reward(t, reward)

HCT Algorithm¶

Introduction¶

Algorithm Parameters¶

Usage Example¶

PyXAB

Navigation

Related Topics