Skip to content

Commit

Permalink
Update beta distribution parameters in 3.4-Multi-Armed-Bandit-Exper…
Browse files Browse the repository at this point in the history
…iment.ipynb (#643)

Update the beta distribution parameters in the `simulate_experiment` function to avoid bias towards lower success probability.

The current specification of the beta distribution:

```
theta = np.random.beta(conversions + 1, exposures + 1)
``` 
treats every exposure as a failure, that is overstates the failures thus undervalues the success probabilities of the variations. The effect is pronounced for variations with very high baseline conversion rates but less severe for variations with extremely low conversion rates. 

Traditionally, the Thompson Sampling Algorithm for the Bernoulli Bandit is:


```math
\begin{align*}
1: & \text{for } t = 1, 2, \ldots \text{ do:} \\
2: & \quad \quad \text{Sample model:} \\
3: & \quad \quad \text{for } k = 1 \text{ to } K \text{ do:} \\
4: & \quad \quad \quad \text{Sample } \theta_k \sim \text{beta}(\alpha_k, \beta_k) \\
5: & \quad \quad \text{$$end for$$} \\
6: \\
7: & \quad \quad \text{Select and apply action:} \\
8: & \quad \quad x_t \leftarrow argmax_k  \theta_k \\
9: & \quad \quad \text{Apply } x_t \text{ and observe } r_t \\
10: \\
11: & \quad \quad \text{Update distribution:} \\
12: & \quad \quad (\alpha_{x_t}, \beta_{x_t}) \leftarrow (\alpha_{x_t} + r_t, \beta_{x_t} + 1 - r_t) \\
13: & \text{end for}
\end{align*}
```
Where  α, β represent the parameters of each arm i.e. the success and failure counts, respectively OR the number of `conversions` and `non-conversions`, respectively. 

```
non-conversions (or beta)  = exposures - conversions
```

Co-authored-by: James Jory <[email protected]>
  • Loading branch information
MustaphaU and james-jory authored Oct 4, 2024
1 parent 21653f3 commit 01f9d3d
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@
" \n",
" data.append(row)\n",
" \n",
" theta = np.random.beta(conversions + 1, exposures + 1)\n",
" theta = np.random.beta(conversions + 1, exposures - conversions + 1)\n",
" thetas[idx] = theta[variation]\n",
" thetaregret[idx] = np.max(thetas) - theta[variation]\n",
"\n",
Expand Down

0 comments on commit 01f9d3d

Please sign in to comment.