ContextualLinTSPolicy implements Thompson Sampling with Linear Payoffs, following Agrawal and Goyal (2011). Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit Policy which assumes the underlying relationship between rewards and contexts are linear. Check the reference for more details.

Usage

policy <- ContextualLinTSPolicy$new(v = 0.2)

Arguments

v

double, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.

Methods

new(v)

instantiates a new ContextualLinTSPolicy instance. Arguments defined in the Arguments section above.

set_parameters(context_params)

initialization of policy parameters, utilising context_params$k (number of arms) and context_params$d (number of context features).

get_action(t,context)

selects an arm based on self$theta and context, returning the index of the selected arm in action$choice. The context argument consists of a list with context$k (number of arms), context$d (number of features), and the feature matrix context$X with dimensions \(d \times k\).

set_reward(t, context, action, reward)

updates parameter list theta in accordance with the current reward$reward, action$choice and the feature matrix context$X with dimensions \(d \times k\). Returns the updated theta.

References

Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.

See also

Core contextual classes: Bandit, Policy, Simulator, Agent, History, Plot

Bandit subclass examples: BasicBernoulliBandit, ContextualLogitBandit, OfflineReplayEvaluatorBandit

Policy subclass examples: EpsilonGreedyPolicy, ContextualLinTSPolicy

Examples

if (FALSE) { horizon <- 100L simulations <- 100L bandit <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3) agents <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"), Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy")) simulation <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE) history <- simulation$run() plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft") }