R/policy_cmab_lin_ts_disjoint.R
ContextualLinTSPolicy.Rd
ContextualLinTSPolicy
implements Thompson Sampling with Linear
Payoffs, following Agrawal and Goyal (2011).
Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit
Policy which assumes the underlying relationship between rewards and contexts
are linear. Check the reference for more details.
policy <- ContextualLinTSPolicy$new(v = 0.2)
v
double, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.
new(v)
instantiates a new ContextualLinTSPolicy
instance.
Arguments defined in the Arguments section above.
set_parameters(context_params)
initialization of policy parameters, utilising context_params$k
(number of arms) and
context_params$d
(number of context features).
get_action(t,context)
selects an arm based on self$theta
and context
, returning the index of the selected arm
in action$choice
. The context argument consists of a list with context$k
(number of arms),
context$d
(number of features), and the feature matrix context$X
with dimensions
\(d \times k\).
set_reward(t, context, action, reward)
updates parameter list theta
in accordance with the current reward$reward
,
action$choice
and the feature matrix context$X
with dimensions
\(d \times k\). Returns the updated theta
.
Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.
Core contextual classes: Bandit
, Policy
, Simulator
,
Agent
, History
, Plot
Bandit subclass examples: BasicBernoulliBandit
, ContextualLogitBandit
,
OfflineReplayEvaluatorBandit
Policy subclass examples: EpsilonGreedyPolicy
, ContextualLinTSPolicy
if (FALSE) { horizon <- 100L simulations <- 100L bandit <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3) agents <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"), Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy")) simulation <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE) history <- simulation$run() plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft") }