We provide the theoretical lower bounds for these settings and the
regret upper bounds of the listed algorithms as the function of the
following key quantities: $K$ - number of arms, $T$ - number of rounds (time steps),
$d$ - dimension (where appropriate) and sometimes also additional
parameters pertinent to the assumptions on the bandits' structure. We shall also use the notations
$R_T$, $r_T$, and $g_T$ respectively for the cumulative regret, simple regret, and gain at time $T$.
## Small number of arms

### Cumulative Regret $E(R_T)$

#### Stochastic

Lower bound:
Under mild assumptions on the reward distribution,
asymptotic lower bound on expected cumulative regret
$\liminf_{t\rightarrow\infty}E(R_t)/\log(t)
\geq \sum_{i=1}^k \frac{\Delta_i}{ KL(P_i||P^*)}$.

#### Adversarial $\Omega(\sqrt{TK})$

### Bound $R_T$ with prob. $1-\delta$

### Simple Regret $E(r_T)$

### Expected Discounted Reward

## Large set of arms

### Unstructured

Lower bound: $\forall \beta > 0, \mu^* \le 1:$ $\Omega(T^{\beta/1+\beta})$

### Linear: $\Omega(D\sqrt{T})$

#### Stochastic:

#### Adversarial:

### Convex $\Omega(D\sqrt{T})$

[blog]

### Lipschitz

#### Stochastic:

### Convex + Lipschitz

#### Stochastic: $\Omega(\sqrt{DT})$

#### Adversarial: $\Omega(D\sqrt{T})$

### Convex + Lipschitz + Lipschitz continuos gradient

### Submodular

ε-greedy | $\liminf_{t\rightarrow\infty}E(R_t)/\log(t) \geq \sum_{i=1}^k \frac{\Delta_i}{ KL(P_i||P^*)}$ | Outperformed by UCB strategies in practice. | |

UCB1 | [Auer et al, 2002] | $E(R_T)/\log T \le 8 \sum_{i:\mu_i \ne \mu^*} \frac{8}{\Delta_i} + o(1)$ | Over-explores. Does not exploit observed reward variance. |

UCB-V | [Auer et al, 2002] | $E(R_T)/\log T \le \sum_{i:\mu_i \ne \mu^*} \left(\frac{\sigma_i^2}{\Delta_i} - 2b\right)$ | The gain over UCB1 is small, but it allows for some control of the risk. |

KL-UCB | [Garivier et al, 2011] | $E(R_T)/\log T \le (1+\epsilon)\sum_{i:\mu_i \ne \mu^*} \left(\frac{\Delta_i}{KL(P_i||P^*)}\right)+o(1)$ | Outperforms UCB1, UCB-V. |

Thomson | [Agrawal et al, 2011] | Empirical performance is better than the UCB Strategies and hints that it might provide better control over risk. | |

FH-Gittins | $E(R_T)/\log T \le \sum_{i:\mu_i \ne \mu^*} 32\left(\frac{1}{\Delta_i}\right)+o(1)$ | Designed for Bayesian setting and requires knowledge of the time horizon, but performs well in comparison with the anytime strategies above. |

EXP | [Auer et al, 2002] | $O\left(\sqrt{TgK \log K}\right)$ | One is able to drop the standard stationarity assumptions on the arms, but one loses significantly in performance. In this simple algorithm one requires to know $T$ in order to set the parameters. |

EXP3.1 | [Auer et al, 2002] | $O\left(\sqrt{TG_{\max}K \log K}\right)$ | A version of the above algorithm which does not require to know $T$ in advance. Other modifications for other scenarios exist. |

PAC-UCB | [Auer et al, 2002] | $R_t\leq C$, $C>0$ is a constant dependending on the distributions of the arms and the parameters of the algorithm. |

Uniform | [Bubeck et al, 2009] | $r_t\leq Ce^{-Dt}$ where $C,D>0$ are distribution dependent. | Empirically three regimes emerge. The Uniform strategy does better in the long run. |

UCB(p) | [Bubeck et al, 2009] | $r_t\leq Cn^{-Dt}$ where $C,D>0$ are distribution dependent. | However over the mid-term UCB strategies can be more efficient. Over very short time scales it is difficult to choose between them. |

Gittins | [Gittins et al, 1979] | Optimal | Achieves the maximum, robust to many types of stochastic arms. Implementation is often difficult and so complexity is difficult to specify. The strategy is not consistent. |

DP Solution | Optimal | Always possible, but computationally very inefficient. | |

FH-Gittins | [Nino-Mora, 2010] | Suboptimal | Computationally efficient in comparison with DP, but not optimal. |

UCB-AIR algorithm | [Wang et al, 2008] | $\tilde{O}(\sqrt{T})$ if $\beta < 1$ and $\mu^* < 1$ $\tilde{O}(T^{\beta/1+\beta})$ if $\beta \ge 1$ or $\mu^* = 1$ |

Confidence Elipsoid | [Dani et al, 2008] | $\tilde{O}(D\sqrt{T})$ | Complexity $O(2^D)$ in general, but efficient for some special cases. |

Self-conc. barrier | [Abernethy, 2008] | $O(D^{3/2}\sqrt{T}\log{T})$ | Always possible, but computationally very inefficient. |

Geometric Hedge | [Dani, 2008] | $O(D^{3/2}\sqrt{T})$ | Efficient algorithm exists if provided access to a certain optimization oracle (such as path planning). |

Bandit Gradient Descend | [Flaxman et al, 2005] | $O(\sqrt{D}T^{5/6})$ |

$\mathcal{X}$-armed bandits | [Bubeck et al, 2011] | $\tilde{O}(T^\frac{d+1}{d+2})$ |

Agarwal's algo | [Agarwal et al, 2011] | $\tilde O(D^{16}\sqrt{T})$ |

Bandit Gradient Descend | [Flaxman et al, 2005] | $O(\sqrt{D}T^{3/4})$ |

Sahas's algo | [Saha et. al, 2011] [video] | $O(D^{2/3}T^{2/3})$ | $O$ hides the dependency on Lipschitz constant and the diameter of the space. |

Hazan's algo | [Hazan et al, 2009] | $O(DT^{2/3})$ |