From 01f9d3d43b1775751fb9874182cb0452785ded27 Mon Sep 17 00:00:00 2001
From: Mustapha Unubi Momoh <123378149+MustaphaU@users.noreply.github.com>
Date: Fri, 4 Oct 2024 17:00:13 -0400
Subject: [PATCH] Update `beta` distribution parameters in
 3.4-Multi-Armed-Bandit-Experiment.ipynb (#643)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Update the beta distribution parameters in the `simulate_experiment` function to avoid bias towards lower success probability.

The current specification of the beta distribution:

```
theta = np.random.beta(conversions + 1, exposures + 1)
```
treats every exposure as a failure, that is overstates the failures thus undervalues the success probabilities of the variations. The effect is pronounced for variations with very high baseline conversion rates but less severe for variations with extremely low conversion rates.

Traditionally, the Thompson Sampling Algorithm for the Bernoulli Bandit is:


```math
\begin{align*}
1: & \text{for } t = 1, 2, \ldots \text{ do:} \\
2: & \quad \quad \text{Sample model:} \\
3: & \quad \quad \text{for } k = 1 \text{ to } K \text{ do:} \\
4: & \quad \quad \quad \text{Sample } \theta_k \sim \text{beta}(\alpha_k, \beta_k) \\
5: & \quad \quad \text{$$end for$$} \\
6: \\
7: & \quad \quad \text{Select and apply action:} \\
8: & \quad \quad x_t \leftarrow argmax_k  \theta_k \\
9: & \quad \quad \text{Apply } x_t \text{ and observe } r_t \\
10: \\
11: & \quad \quad \text{Update distribution:} \\
12: & \quad \quad (\alpha_{x_t}, \beta_{x_t}) \leftarrow (\alpha_{x_t} + r_t, \beta_{x_t} + 1 - r_t) \\
13: & \text{end for}
\end{align*}
```
Where  α, β represent the parameters of each arm i.e. the success and failure counts, respectively OR the number of `conversions` and `non-conversions`, respectively.

```
non-conversions (or beta)  = exposures - conversions
```

Co-authored-by: James Jory <james@jamesjory.com>
---
 .../3-Experimentation/3.4-Multi-Armed-Bandit-Experiment.ipynb   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/workshop/3-Experimentation/3.4-Multi-Armed-Bandit-Experiment.ipynb b/workshop/3-Experimentation/3.4-Multi-Armed-Bandit-Experiment.ipynb
index e045fec39..46ed756be 100644
--- a/workshop/3-Experimentation/3.4-Multi-Armed-Bandit-Experiment.ipynb
+++ b/workshop/3-Experimentation/3.4-Multi-Armed-Bandit-Experiment.ipynb
@@ -509,7 +509,7 @@
     "            \n",
     "        data.append(row)\n",
     "        \n",
-    "        theta = np.random.beta(conversions + 1, exposures + 1)\n",
+    "        theta = np.random.beta(conversions + 1, exposures - conversions + 1)\n",
     "        thetas[idx] = theta[variation]\n",
     "        thetaregret[idx] = np.max(thetas) - theta[variation]\n",
     "\n",