-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMixture Models.qmd
137 lines (105 loc) · 4.32 KB
/
Mixture Models.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Mixture Models"
author: "Tyler Dunlap, PharmD"
execute:
eval: false
format:
html:
title-block-banner: true
self-contained: true
toc: true
toc-location: left
css: styles.css
theme: journal
page-layout: article
editor: visual
bibliography: references.bib
---
## Mixture Models
```{r}
#| warning: false
#| error: false
#| message: false
library(tidyverse)
library(mrgsolve)
```
## Overview
Mixture models are a class of statistical models used for modeling data that can be thought of as a combination of multiple underlying subpopulations or distributions. In other words, mixture models are used to represent the idea that the observed data comes from a mixture of different sources or components, each with its own probability distribution. **Components or Clusters:** These are the underlying distributions that make up the mixture. Each component represents a distinct subpopulation or group within the data. For example, in a Gaussian mixture model, each component is a Gaussian (normal) distribution. **Component Weights:** These are the probabilities associated with each component. They represent the proportion of the data that comes from each component. The weights must sum up to 1.
### Types
| Mixture | Description |
|-------------------|-----------------------------------------------------|
| **Gaussian Mixture Model** | Each component is a Gaussian distribution (i.e., ETAs). These types of models are widely used for clustering and density estimation with continuous data. |
| **Multinomial Mixture Model** | Used for discrete data, where each component follows a multinomial distribution. |
Here is a pharmacokinetic example demonstrating a mixture model. Say we have a drug in which the clearance is strongly associated with a mutation in a CYP3A4 drug-metabolizing enzyme. We haven't done any pharmacogenomic testing, but we may be able to infer the probability of the patient having the relevant mutation based on their observed drug clearance.
```{r}
#| warning: false
#| error: false
#| message: false
pkmix <- '
$PROB PK mixture model
$PLUGIN Rcpp
$PARAM @annotated
TVCL : 10 :
TVV1 : 42 :
MUTEFF : -1.5 : effect of mutation
$PARAM @annotated @covariate
MUT : 0 : mutation
$CMT CENT
$OMEGA @annotated
eCL : 0.09 : eta clearance
$MAIN
if(NEWIND <=1) {
int POP = R::rbinom(1,0.2);
}
double CL = TVCL*exp(eCL);
double V1 = TVV1;
if (POP==1) CL = TVCL*exp(MUTEFF*POP)*exp(eCL);
$ODE
dxdt_CENT = -(CL/V1)*CENT;
$TABLE
capture IPRED = CENT/V1;
$CAPTURE POP
'
pkmix <- mcode('pkmix', pkmix)
```
```{r}
#| warning: false
#| error: false
#| message: false
pkmix %>%
ev(data.frame(ID=1:30, amt=250, cmt=1, time=0)) %>%
mrgsim(end = 24, delta=1) %>%
plot(IPRED~time|factor(POP))
```
The mixture may be composed of more than one population... **NOT DONE**
```{r}
#| warning: false
#| error: false
#| message: false
code <- '
$PROB Multiple populations mixture model
$PLUGIN Rcpp
$PARAM @annotated
TVBS ; ; baseline size
TVKG : : growth rate
TVKD : : death rate (drug effect)
TVF : :
P1 : 0.33 :
P2 : :
P3 : :
$MAIN
if(NEWIND <=1) {
double mix = R::runif(0,1);
int POP = 1;
if(mixv > p1) POP = 2;
if(mixv > (p1+p2)) POP = 3;
}
$CAPTURE POP mixv
'
```
::: callout-note
Mixture models form the basis for some more complicated models such as "Cure" and "Hidden Markov Models" which are discussed in the "Frailty & Cure Models" and "Multistate Models" sections, respectively.
:::
### Summary
Mixture models are often used for various applications, including clustering, density estimation, anomaly detection, and more. They are especially useful when the data is a blend of multiple underlying processes, and traditional single-distribution models like Gaussians might not capture the complexity adequately. The parameters of a mixture model are typically estimated using techniques like the Expectation-Maximization (EM) algorithm or Variational Inference.
### References