Policy: Linear Thompson Sampling with unique linear models

ContextualLinTSPolicy implements Thompson Sampling with Linear Payoffs, following Agrawal and Goyal (2011). Thompson Sampling with Linear Payoffs is a contextual Thompson Sampling multi-armed bandit Policy which assumes the underlying relationship between rewards and contexts are linear. Check the reference for more details.

Usage

policy &lt;- ContextualLinTSPolicy$new(v = 0.2)

Arguments

v

double, a positive real value R+; Hyper-parameter for adjusting the variance of posterior gaussian distribution.

Methods

new(v)

instantiates a new ContextualLinTSPolicy instance. Arguments defined in the Arguments section above.

set_parameters(context_params)

initialization of policy parameters, utilising context_params$k (number of arms) and context_params$d (number of context features).

get_action(t,context)

selects an arm based on self$theta and context, returning the index of the selected arm in action$choice. The context argument consists of a list with context$k (number of arms), context$d (number of features), and the feature matrix context$X with dimensions $d \times k$.

set_reward(t, context, action, reward)

updates parameter list theta in accordance with the current reward$reward, action$choice and the feature matrix context$X with dimensions $d \times k$. Returns the updated theta.

References

Shipra Agrawal, and Navin Goyal. "Thompson Sampling for Contextual Bandits with Linear Payoffs." Advances in Neural Information Processing Systems 24. 2011.

Examples

if (FALSE) {

horizon       <- 100L
simulations   <- 100L

bandit        <- ContextualLinearBandit$new(k = 4, d = 3, sigma = 0.3)

agents        <- list(Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "EGreedy"),
                      Agent$new(ContextualLinTSPolicyPolicy$new(0.1), bandit, "LinTSPolicy"))

simulation     <- Simulator$new(agents, horizon, simulations, do_parallel = TRUE)

history        <- simulation$run()

plot(history, type = "cumulative", rate = FALSE, legend_position = "topleft")

}

Usage

Arguments

Methods

References

See also

Examples