R/policy_cmab_lin_ts_disjoint.R
ContextualLinTSPolicy.RdContextualLinTSPolicy implements Thompson Sampling with Linear
Payoffs, following Agrawal and Goyal (2011).
Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit
Policy which assumes the underlying relationship between rewards and contexts
are linear. Check the reference for more details.
policy <- ContextualLinTSPolicy$new(v = 0.2)
vdouble, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.
new(v)instantiates a new ContextualLinTSPolicy instance.
Arguments defined in the Arguments section above.
set_parameters(context_params)initialization of policy parameters, utilising context_params$k (number of arms) and
context_params$d (number of context features).
get_action(t,context)selects an arm based on self$theta and context, returning the index of the selected arm
in action$choice. The context argument consists of a list with context$k (number of arms),
context$d (number of features), and the feature matrix context$X with dimensions
\(d \times k\).
set_reward(t, context, action, reward)updates parameter list theta in accordance with the current reward$reward,
action$choice and the feature matrix context$X with dimensions
\(d \times k\). Returns the updated theta.
Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.
Core contextual classes: Bandit, Policy, Simulator,
Agent, History, Plot
Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit,
OfflineReplayEvaluatorBandit
Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy
if (FALSE) { horizon <- 100L simulations <- 100L bandit <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3) agents <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"), Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy")) simulation <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE) history <- simulation$run() plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft") }