forked from harpak-lab/drosophila-longevity
-
Notifications
You must be signed in to change notification settings - Fork 0
/
week2_dros.Rmd
569 lines (485 loc) · 20.1 KB
/
week2_dros.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
---
title: "Week 2 Drosophila Work"
output:
pdf_document: default
html_notebook: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r libraries, include=FALSE}
set.seed(37)
library(dplyr)
library(data.table)
```
## Overview
The purpose of my work this week is to build simulated datasets similar to those used by Pallares et al. The purpose of these simulated datasets is to be able to test the power and Type I error of both the model prosposed by Pallares et al. as well as the interaction model proposed by Rebecca / Arbel. Ideally, we aim to show that the interaction model is able to effectively identify types of genetic effects (e.g. amplification) that the origin model is unable to.
## Dataset Simulation
As a reference for the dataset simulations, I used the simulation setup that Pallares et al. uses to compare the CMH and beta-binomial approaches for detecting genetic signal. I made some small modifications to this setup (e.g. allowing for different effects in the control and high sugar environments) that I felt would be useful in evaluating the interaction model. Specifically, we assume that $n$ individuals are sampled throughout the experiment ($\frac{1}{2}$ at $T_{0}$, $\frac{1}{4}$ in C at $T_{N}$, $\frac{1}{4}$ in HS at $T_{N}$). For each individual, $l$ loci are sampled. Let $\pi_{ij}$ be the genotype of individual $i$ at locus $j$ (1 for homozygous alternative, 0.5 for heterozygous, 0 otherwise). Let $y_{ij}$ be the number of alternative allele reads of individual $i$ at locus $j$, and let $r_{j}$ be the total number of allele reads at locus $j$ (assume this is the same for all individuals). Finally, let $\bar{\pi}_{\cdot j}^{(c)}$ be the average alternative allele frequency for all individuals at locus $j$ in condition $c$, where $c \in \{T_{0}, C, HS\}$.
\newline
\newline
Then, we use the following procedure to generate a simulated dataset:
1.
+ Generate $\bar{\pi}_{\cdot 1}^{(T_{0})}, ..., \bar{\pi}_{\cdot l}^{(T_{0})} \sim \hat{F}_{maf}$ independently, where $\hat{F}_{maf}$ is the estimated distribution of minor allele frequencies from the data.
+ Generate the change in minor allele frequencies for each site $j$ and each condition $c$, $\Delta_{j}^{(c)}$ according to a mixture distribution. For more details on the possible parameters of this mixtures please see the code below. But, in summary, this mixture distribution can put probability on 5 different cases. (1) Both conditions null, (2) HS alt, C null, (3) C alt, HS null, (4) HS and C alt, same magnitude, (5) HS and C alt, HS effect is amplified. Once these change values are calculated, use them to calculate the minor allele frequencies for each condition at each site at $T_{N}$.
2. For each locus $j$, generate genotypes for each individual, $\pi_{1j}, ..., \pi_{1n}$ according to the corresponding minor allele frequency for each site and condition.
3. For each locus $j$, generate the total number of reads at that locus, $r_{j}$ from a mixture of negative binomial distributions. This mixture is determined empirically by fitting a negative binomial to 5000 randomly selected alleles, and then putting a weight of $\frac{1}{5000}$ on each distribution. This is exactly how Pallares et al. conducts their power simulation.
4. Finally, generate the number of alternative reads at each locus for each individual as:
\begin{equation*}
y_{ij} \sim Bin(r_{j}, \pi_{ij})
\end{equation*}
Once these steps are completed, we have a dataset on which to compute analysis. Below is the code to generate datasets of this form.
```{r}
#' Generate n genotypes with a minor allele frequency
#'
#' @param maf Minor allele frequency. Must be in [0, 1]
#' @param n Total number of genotypes to generate
#'
#' @return Vector of genotypes, where 0 is homozygous major allele,
#' .5 is heterozygous, and 1 is homozygous minor allele
#' @export
#'
#' @examples
generate_genotypes <- function(maf, n) {
sample(
x=c(0, 0.5, 1),
size=n,
prob=c((1 - maf) ^ 2, 2 * maf * (1 - maf), maf ^ 2),
replace = TRUE
)
}
#' Generate Allele Frequency Changes for Simulated Dataset
#'
#' The function assumes that the true allele frequency changes
#' for each site follows one of a few cases:
#' (1) The minor allele frequency change is 0 in both cases.
#' (2) The minor allele frequency changes by \code{cx} percent in the HS case only.
#' (3) The minor allele frequency changes by \code{cx} percent in the C case only.
#' (4) The minor allele frequency changes by \code{cx} percent in both cases.
#' (5) The minor allele frequency changes by \code{cx} percent in C
#' and \code{cx} * \code{hs_amp_coef} in HS.
#' The user is able to input the magnitude of changes as well as
#' the probability of landing in each case.
#'
#' @param sites Number of SNPs for which to generate data
#' @param prob_both_null Probability of being in case 1 in the description
#' @param prob_hs_only Probability of being in case 2 in the description
#' @param prob_c_only Probability of being in case 3 in the description
#' @param prob_same Probability of being in case 4 in the description
#' @param prob_amp Probability of being in case 5 in the description
#' @param cx Percentage change of minor allele frequency at time TN
#' @param hs_amp_coef Amplification constant, see description for more details
#'
#' @return Dataframe with minor allele frequency changes for each site in the HS and C cases.
#' @export
#'
#' @examples
generate_tn_allele_cxs <- function(
sites,
prob_both_null,
prob_hs_only,
prob_c_only,
prob_same,
prob_amp,
cx,
hs_amp_coef = 1.0
) {
# Create matrices to select from so that only one call to sample is made
# c(both_null, hs_only, c_only, same, amp)
c_matrix_data <- c(rep(0, 2 * sites), rep(cx, 3 * sites))
c_matrix <- matrix(data = c_matrix_data, ncol = sites, byrow = TRUE)
hs_matrix_data <- c(
rep(0, sites),
rep(cx, sites),
rep(0, sites),
rep(cx, sites),
rep(cx * hs_amp_coef, sites)
)
hs_matrix <- matrix(data = hs_matrix_data, ncol = sites, byrow = TRUE)
# Select from 5 cases with specified probability
case_gen <- sample(
x = c(1:5),
size = sites,
replace = TRUE,
prob = c(prob_both_null, prob_hs_only, prob_c_only, prob_same, prob_amp)
)
str_possibilities <- c("both_null", "hs_only", "c_only", "same", "hs_amp")
str_selections <- str_possibilities[case_gen]
# Find proper indices in matrix
selected_indices <- c(0:(sites - 1)) * 5 + case_gen
selected_c_cxs <- c_matrix[selected_indices]
selected_hs_cxs <- hs_matrix[selected_indices]
maf_cx_tn_df <- data.frame(c_cx = selected_c_cxs, hs_cx = selected_hs_cxs)
return(list(cx_df = maf_cx_tn_df, true_signal = str_selections))
}
#' Generate Simulated Drosophila Dataset With Specified Signal Setup
#'
#' The function assumes that the true allele frequency changes
#' for each site follows one of a few cases:
#' (1) The minor allele frequency change is 0 in both cases.
#' (2) The minor allele frequency changes by \code{cx}
#' percent in the HS case only.
#' (3) The minor allele frequency changes by \code{cx}
#' percent in the C case only.
#' (4) The minor allele frequency changes by \code{cx} percent in both cases.
#' (5) The minor allele frequency changes by \code{cx}
#' percent in C and \code{cx} * \code{hs_amp_coef} in HS.
#' The user is able to input the magnitude of changes as well as the probability
#' of landing in each case.
#'
#' @param n Total number of flies. 1/2 are at time 0,
#' 1/4 are high sugar at time N, 1/4 are control at time N
#' @param sites Total number of SNP sites for each fly
#' @param prob_both_null Probability of being in case 1 in the description
#' @param prob_hs_only Probability of being in case 2 in the description
#' @param prob_c_only Probability of being in case 3 in the description
#' @param prob_same Probability of being in case 4 in the description
#' @param prob_amp Probability of being in case 5 in the description
#' @param cx Percentage change of minor allele frequency at time TN
#' @param hs_amp_coef Amplification constant, see description for more details
#' @param nb_params_dist Vector of lists of negative binomial parameters to be
#' sampled from to determine reads for each SNP.
#' For each SNP, one set of these parameters are sampled, and then a negative
#' binomial distribution with these parameters
#' is used to generate \code{n} read counts for each SNP. Each list should
#' contain an element t corresponding to
#' the \code{n} argument in \code{rnbinom} and \code{p} corresponding to the
#' \code{prob} argument in \code{rnbinom}.
#'
#' @return A list with (1) a dataframe containing information from
#' simulations and (2) a dataframe containing
#' the generated signal classicification for each site
#' @export
#'
#' @examples
generate_simulated_dataset <- function(
n,
sites,
maf_dist,
prob_both_null,
prob_hs_only,
prob_c_only,
prob_same,
prob_amp,
cx,
nb_params_dist,
hs_amp_coef = 1.0
) {
# Divide fly counts into groups
n_t0 <- floor(n / 2)
n_hs_tn <- floor((n - n_t0) / 2)
n_c_tn <- n_hs_tn
# Sample parameters before loop for speed
maf_t0_freqs <- sample(x = maf_dist, size = sites, replace = TRUE)
freq_change_list <- generate_tn_allele_cxs(
sites,
prob_both_null,
prob_hs_only,
prob_c_only,
prob_same,
prob_amp,
cx,
hs_amp_coef
)
freq_change_df <- freq_change_list$cx_df
true_signal_df <- data.frame(
site = c(1:sites), true_signal = freq_change_list$true_signal
)
maf_hs_tn_freqs <- maf_t0_freqs * (
1 + freq_change_df$hs_cx
)
maf_c_tn_freqs <- maf_t0_freqs * (
1 + freq_change_df$c_cx
)
nb_site_params <- sample(x = nb_params_dist, size = sites, replace = TRUE)
site_vec <- c()
total_reads_vec <- c()
alt_reads_vec <- c()
condition_vec <- c()
time_vec <- c()
for(i in c(1:sites)) {
# Generate genotypes for HS and Control at T0 and TN
site_t0_maf <- maf_t0_freqs[i]
genotypes_t0 <- generate_genotypes(site_t0_maf, n_t0)
site_tn_hs_maf <- maf_hs_tn_freqs[i]
genotypes_hs_tn <- generate_genotypes(site_tn_hs_maf, n_hs_tn)
site_tn_c_maf <- maf_c_tn_freqs[i]
genotypes_c_tn <- generate_genotypes(site_tn_c_maf, n_c_tn)
# Sample reads based on negative binomial distribution
# Make sure there are never 0 reads, this should happen very rarely
total_reads <- pmax(
1,
rnbinom(
n = n, size = nb_site_params[i][[1]]$t, prob = nb_site_params[i][[1]]$p
)
)
alt_reads <- rbinom(
n = n,
size = total_reads,
prob = c(genotypes_t0, genotypes_hs_tn, genotypes_c_tn)
)
site_vec <- c(site_vec, rep(i, n))
total_reads_vec <- c(total_reads_vec, total_reads)
alt_reads_vec <- c(alt_reads_vec, alt_reads)
condition_vec <- c(
condition_vec, rep("both", n_t0), rep("HS", n_hs_tn), rep("C", n_c_tn)
)
time_vec <- c(time_vec, rep("T0", n_t0), rep("TN", n_hs_tn + n_c_tn))
}
sim_df <- data.frame(
site = site_vec,
total_reads = total_reads_vec,
alt_reads = alt_reads_vec,
condition = condition_vec,
time = time_vec
)
return(list(sim_df = sim_df, true_signal_df = true_signal_df))
}
```
To ensure correctness of this simulation process, I built code to run some automated tests.
```{r}
test_generate_simulated_dataset <- function() {
sim_df <- generate_simulated_dataset(
n = 20,
sites = 10,
maf_dist = c(.25, .4),
prob_both_null = .5,
prob_hs_only = 0.1,
prob_c_only = 0.1,
prob_same = 0.1,
prob_amp = 0.2,
cx = -0.15,
hs_amp_coef = 1.5,
nb_params_dist = list(list(t = 25, p = .4), list(t = 10, p = .3))
)$sim_df
# Test basic expected features of simulated dataset
num_rows_t0 <- sim_df %>% filter(time == 'T0') %>% nrow()
stopifnot(num_rows_t0 == 100)
num_rows_tn <- sim_df %>% filter(time == 'TN') %>% nrow()
stopifnot(num_rows_tn == 100)
num_rows_both <- sim_df %>% filter(condition == 'both') %>% nrow()
stopifnot(num_rows_both == 100)
num_rows_c <- sim_df %>% filter(condition == 'C') %>% nrow()
stopifnot(num_rows_c == 50)
num_rows_hs <- sim_df %>% filter(condition == 'HS') %>% nrow()
stopifnot(num_rows_hs == 50)
sim_df2 <- generate_simulated_dataset(
n = 500,
sites = 250,
maf_dist = c(.25, .4),
prob_both_null = 1.0,
prob_hs_only = 0.0,
prob_c_only = 0.0,
prob_same = 0.0,
prob_amp = 0.0,
cx = -0.15,
nb_params_dist = list(list(t = 25, p = .4), list(t = 10, p = .3))
)$sim_df
# Make sure allele frequencies don't vary
# accross time and condition in the all null case
sim_df2 <- sim_df2 %>% mutate(allele_prop = alt_reads / total_reads)
sim_df2_summary <- sim_df2 %>%
group_by(time, condition) %>%
summarize(maf = mean(allele_prop))
max_diff <- max(sim_df2_summary$maf) - min(sim_df2_summary$maf)
stopifnot(max_diff < 0.01)
sim_df3 <- generate_simulated_dataset(
n = 500,
sites = 250,
maf_dist = c(.25, .4),
prob_both_null = 0.0,
prob_hs_only = 0.5,
prob_c_only = 0.5,
prob_same = 0.0,
prob_amp = 0.0,
cx = -0.15,
nb_params_dist = list(list(t = 25, p = .4), list(t = 10, p = .3))
)$sim_df
sim_df3 <- sim_df3 %>% mutate(allele_prop = alt_reads / total_reads)
sim_df3_summary <- sim_df3 %>%
group_by(condition) %>%
summarize(maf = mean(allele_prop)) %>%
filter(condition != 'both')
maf_tn_diff <- max(sim_df3_summary$maf) - min(sim_df3_summary$maf)
stopifnot(maf_tn_diff < 0.01)
sim_df4 <- generate_simulated_dataset(
n = 500,
sites = 250,
maf_dist = c(.25, .4),
prob_both_null = 0.0,
prob_hs_only = 0.0,
prob_c_only = 0.0,
prob_same = 1.0,
prob_amp = 0.0,
cx = -0.15,
nb_params_dist = list(list(t = 25, p = .4), list(t = 10, p = .3))
)$sim_df
sim_df4 <- sim_df4 %>% mutate(allele_prop = alt_reads / total_reads)
sim_df4_summary <- sim_df4 %>%
group_by(time) %>%
summarize(maf = mean(allele_prop))
maf_t0 <- (sim_df4_summary %>% filter(time == 'T0'))$maf[1]
maf_tn <- (sim_df4_summary %>% filter(time == 'TN'))$maf[1]
maf_cx <- (maf_tn / maf_t0) - 1
maf_diff_vs_expected <- maf_cx - .15
stopifnot(maf_diff_vs_expected < 0.01)
# Finally, I want to do a test of the amplification
# Again, check for correct MAFs in this case
sim_df5 <- generate_simulated_dataset(
n = 2000,
sites = 1000,
maf_dist = c(.25, .4),
prob_both_null = 0.5,
prob_hs_only = 0.0,
prob_c_only = 0.0,
prob_same = 0.0,
prob_amp = 0.5,
hs_amp_coef = 2.0,
cx = -0.1,
nb_params_dist = list(list(t = 25, p = .4), list(t = 10, p = .3))
)$sim_df
sim_df5 <- sim_df5 %>% mutate(allele_prop = alt_reads / total_reads)
sim_df5_summary <- sim_df5 %>%
group_by(time, condition) %>%
summarize(maf = mean(allele_prop))
t0_maf <- (sim_df5_summary %>% filter(condition == "both"))$maf[1]
stopifnot(abs(t0_maf - .325) < .01)
tn_maf_c <- (sim_df5_summary %>% filter(condition == "C"))$maf[1]
stopifnot(abs(tn_maf_c - .95 * .325) < .01)
tn_maf_hs <- (sim_df5_summary %>% filter(condition == "HS"))$maf[1]
stopifnot(abs(tn_maf_hs - .9 * .325) < .01)
}
```
I also built the code to run the model proposed by Pallares et al. on the simulated data.
```{r}
#' Calculate results for simulated data based on the methods of Pallares et al.
#'
#' The function fits a beta-binomial regression
#' of allele frequency on a 3-level treatment factor(T0, C TN, HS TN).
#' Let the regression coefficient for HS TN be Bh and the regression
#' coefficient for C TN be Bc. Let the p-value
#' for the HS TN regression coefficient be pBh and the q-value be qBh
#' (defined similarly for C TN). Then, for a pval thresh of .05 and a qval
#' thresh of .1, the function classifies each variant as either
#' (1) a shared signal (if one of pBh or pBc < .05 and
#' one of qBh or qBc < .1), (2) an HS only signal (if qBh < .1 and pBc >= .05),
#' (3) a c only signal (if qBc < .1 and pBch >= .05),
#' or (4) null (if none of the other conditions are met).
#'
#' @param sim_df Simulated dataframe in the formated produced
#' by \code{generate_simulated_data}
#' @param n Total number of individuals per site
#' @param sites Total number of sites
#' @param pval_thresh pvalue threshold for individual regression coefficients
#' to declare significance in cases above.
#' @param qval_thresh qvalue threshold for individual regression coefficients
#' to declare significan in cases above.
#'
#' @return Dataframe with site numbers and signal classifications
#' @export
#'
#' @examples
calc_pallares_results <- function(
sim_df, n, sites, pval_thresh = .05, qval_thresh = .1
) {
site_vec <- c()
test_result_vec <- c()
# Make sure that the sim_df is sorted
sim_df <- sim_df %>% arrange(site)
cond_c_pval_vec <- numeric(sites)
cond_hs_pval_vec <- numeric(sites)
for(i in c(1:sites)) {
# slice instead of filter for speed
site_df <- sim_df %>% slice(c((1 + (i - 1) * n):(i * n)))
bbn_model <- aod::betabin(
cbind(alt_reads, total_reads - alt_reads) ~ condition, ~1, data=site_df
)
mod_sum <- aod::summary(bbn_model)
cond_c_pval_vec[i] <- attributes(mod_sum)$Coef['conditionC', 4]
cond_hs_pval_vec[i] <- attributes(mod_sum)$Coef['conditionHS', 4]
}
hs_pval_test <- cond_hs_pval_vec < pval_thresh
c_pval_test <- cond_c_pval_vec < pval_thresh
hs_qval_test <- qvalue::qvalue(
p = cond_hs_pval_vec, fdr.level = qval_thresh
)$significant
c_qval_test <- qvalue::qvalue(
p = cond_c_pval_vec, fdr.level = qval_thresh
)$significant
# shared signal if the p-value < .05 in one environment and < 10% FDR in the other
shared_test <- (hs_pval_test & c_qval_test) | (c_pval_test & hs_qval_test)
# hs only if the fdr test passes for hs but the pval test fails for c
hs_only_test <- !c_pval_test & hs_qval_test
# c only if the fdr test passes for c but the pval test fails for hs
c_only_test <- !hs_pval_test & c_qval_test
results_df <- data.frame(
site = c(1:sites),
shared_test = shared_test,
hs_only_test = hs_only_test,
c_only_test = c_only_test
)
results_df <- results_df %>%
mutate(
classification = case_when(
shared_test ~ "shared",
hs_only_test ~ "hs_only",
c_only_test ~ "c_only",
TRUE ~ "null"
)
)
results_df <- results_df %>% select(c(site, classification))
return(results_df)
}
```
Below is a simple example of how to use this code:
```{r, eval=FALSE}
sim_df3 <- generate_simulated_dataset(
n = 1000,
sites = 50,
maf_dist = c(.25, .4),
prob_both_null = 0.5,
prob_hs_only = 0.5,
prob_c_only = 0.0,
prob_same = 0.0,
prob_amp = 0.0,
cx = -0.25,
nb_params_dist = list(list(t = 25, p = .4), list(t = 10, p = .3))
)$sim_df
results <- calc_pallares_results(sim_df3, 1000, 50)
```
Below is a sample of the code that could be used in order to generate the distribution of minor allele frequencies.
```{r, eval=FALSE}
# sample 5K of the sites. This number could be increased or decreased
# note that the distinct call is necessary because there are a very small number
# of duplicated entries
alt <- fread('data/29Jun20_merged_alt_counts_allCHR.txt') %>%
distinct(site, .keep_all = TRUE) %>%
sample_n(5000)
both <- fread('data/29Jun20_merged_both_counts_allCHR.txt') %>%
filter(site %in% alt$site) %>%
distinct(site, .keep_all = TRUE)
alt_counts <- alt %>% select(-site) %>% rowSums()
both_counts <- both %>% select(-site) %>% rowSums()
alt_count_df <- data.frame(site=alt$site, alt_count = alt_counts)
both_count_df <- data.frame(site=both$site, total_count = both_counts)
count_df <- alt_count_df %>% inner_join(both_count_df, by = c("site"))
count_df <- count_df %>% mutate(alt_freq = alt_count / total_count)
# now, count_df$alt_freq contains a distribution over alt allele frequencies
```
Below is a sample of the code that could be used to generate the negative binomial parameters required for determining the number of reads.
```{r, eval=FALSE}
nb_params_list <- list()
variant_ind <- 1
for(j in c(1:nrow(both))) {
variant_reads <- as.vector(t(both[j,-1]))
variant_fit <- MASS::fitdistr(
x = variant_reads, densfun = "negative binomial", lower = c(1, 0)
)
variant_params_list <- list(
t = variant_fit$estimate["size"], p = variant_fit$estimate["mu"]
)
nb_params_list[[variant_ind]] <- variant_params_list
variant_ind <- variant_ind + 1
}
# now, nb_params_list has a set of parameters to be sampled from
```