更新 compare_and_contrast.md 文檔，修正語句並新增 RCBD 方法的詳細說明

JamboChen · Nov 17, 2024 · d2bcd44 · d2bcd44
1 parent 1686f27
commit d2bcd44
Show file tree

Hide file tree

Showing 2 changed files with 230 additions and 3 deletions.
diff --git a/math/experimental_designs/compare_and_contrast.md b/math/experimental_designs/compare_and_contrast.md
@@ -1,6 +1,6 @@
 # Compare and Contrast
 
-實驗中，我們會用AONVA 表來檢定因素對結果是否是有影響的，這是 Compare。而 Contrast 則是更進一步分析，不同因素之間的主要差異在哪裡。
+實驗中我們會用 AONVA 表來檢定因素對結果是否是有影響的，這是 Compare。而 Contrast 則是更進一步分析，不同因素之間的主要差異在哪裡。
 
 假設我們有 4 個因素 $A,B,C,D$，我們在獲得數據之前可能有下面幾個問題：
 1. Is $A$ different from $C$ ? $\implies H_0:\mu_A=\mu_C$ v.s. $H_1:\mu_A\neq\mu_C$
@@ -147,7 +147,7 @@ Compare all contrasts with overall probability of type I error $\le 1-\alpha$
 
 :::info[Definition]
 $$
-S_{\alpha;cm}=s_{cm}\sqrt{(k-1)F_{k-1,N-k,\alpha}}\quad\text{with }s_{cm}=\sqrt{\sum c_{im}^2n_iMS_E}
+S_{\alpha;cm}=s_{cm}\sqrt{(k-1)F_{k-1,N-k,\alpha}}\quad\text{with }s_{cm}=\sqrt{\sum \frac{c_{im}^2}{n_i}MS_E}
 $$
 :::
 
@@ -162,4 +162,137 @@ $$
 $$
 :::
 
-$\implies H_0: \Gamma_m=0 vs H_1:\Gamma_m\neq 0$, reject $H_0$ at level $\alpha\iff |C_m|>S_{\alpha,m}$
+$\implies H_0: \Gamma_m=0 vs H_1:\Gamma_m\neq 0$, reject $H_0$ at level $\alpha\iff |C_m|>S_{\alpha,m}$
+
+## Comparing Pairs of Treatment Means
+
+### Tukey's Method
+
+用於比較兩個 trt 的平均值是否有顯著差異，並且保證所有的成對比較的總類型 I 錯誤率不超過 $\alpha$
+
+設有 $k$ 個 trt ，它們的平均分別為 $\mu_1,\cdots,\mu_k$
+
+$\forall i\neq i'$ 用 $\bar{Y}_{i\cdot}-\bar{Y}_{i'\cdot}$ 來估計 $\mu_i-\mu_{i'}$
+
+$$
+\implies \bar{Y}_{i\cdot}-\bar{Y}_{i'\cdot}\sim N\left(\mu_i-\mu_{i'},\frac{\sigma_\varepsilon^2}{n_i}+\frac{\sigma_\varepsilon^2}{n_i'}\right)
+
+\implies \frac{\bar{Y}_{i\cdot}-\bar{Y}_{i'\cdot}}{\sqrt{MS_E\left(\frac{1}{n_i}+\frac{1}{n_i'}\right)}}\sim t_{N-k}
+$$
+
+$$
+\implies 1-\alpha=P\left(\mu_i-\mu_{i'}\in\underbrace{\left[\bar{Y}_{i\cdot}-\bar{Y}_{i'\cdot}\pm t_{N-k,\alpha/2}\sqrt{MS_E\left(\frac{1}{n_i}+\frac{1}{n_i'}\right)}\right]}_{CI(\mu_i-\mu_{i'};\alpha)}\right)
+$$
+
+但所有信賴區間都成功的幾率會小於 $1-\alpha$ ，因此我們希望能找到一個區間 $CI^*$ s.t. $P(\mu_i-\mu_{i'}\in CI^*,\forall i\neq i')\ge 1-\alpha$
+
+:::info[Definition]
+$$
+T_\alpha=\frac{q_\alpha(k,f)}{\sqrt{2}}\sqrt{(\frac{1}{n_i}+\frac{1}{n_i'})MS_E}\xlongequal{\text{bal}}q_\alpha(k,f)\sqrt{\frac{MS_E}{n}}
+$$
+
+- $k=$ number of trt
+- $f=$ df of error
+:::
+
+:::tip[Theorem]
+Tukey's Result:
+
+$$
+P\left(\mu_i-\mu_{i'}\in\left[\bar{Y}_{i\cdot}-\bar{Y}_{i'\cdot}\pm T_\alpha\right],\forall i\neq i'\right)\ge 1-\alpha
+$$
+:::
+
+i.e. $\forall i\neq i'$ with $H_0:\mu=\mu'$ vs $H_1:\mu\neq\mu'$, reject $H_0$ $\iff |\bar{Y}_{i\cdot}-\bar{Y}_{i'\cdot}|>T_\alpha$ with overall sig. level $\le \alpha$
+
+### Fisher Least Significant Difference (LSD) Method
+
+The Fisher Least Significant Difference (LSD) Method. P99-101
+
+### Student-Newman-Keuls (SNK) Method
+
+檢驗一對 trt 的平均值中，數值較大的 trt 是否顯著大於數值較小的 trt。
+
+$$
+H_0:\mu_i=\mu_j\quad\text{v.s.}\quad H_1:\mu_i>\mu_j\quad\text{with }\bar{Y}_{i\cdot}>\bar{Y}_{j\cdot}
+$$
+
+使用 fabric 的數據在 $\alpha=0.05$ 下進行 SNK 檢定：
+
+1. 將所有的 trt 平均從小到大排序
+   | fabric      | A    | D    | C    | B    |
+   | ----------- | ---- | ---- | ---- | ---- |
+   | sample mean | 2.19 | 2.32 | 2.42 | 2.68 |
+
+   將所有的 trt 進行兩兩比較，並且計算它們的差值
+   |     | A    | D    | C    | B   |
+   | --- | ---- | ---- | ---- | --- |
+   | A   |      |      |      |     |
+   | D   | 0.13 |      |      |     |
+   | C   | 0.23 | 0.10 |      |     |
+   | B   | 0.49 | 0.36 | 0.26 |     |
+
+
+2. 從 ANVOA 表中得到數據 $MS_E=0.0203$ 和 $df=12$，並計算要比較的兩個 trt 的方差：
+
+   $$
+   S_{AB}=\sqrt{\frac{MS_E}{2}(\frac{1}{n_i}+\frac{1}{n_j})}\xlongequal{\text{bal}}\sqrt{\frac{MS_E}{n}}
+   $$
+
+   在這組數據下 $S_{\bar{Y}_{i\cdot}}=\sqrt{\frac{0.0203}{4}}=0.0712$
+
+3. 通過查表得到 $q_\alpha(p, df)$ ，其中 $p=2,\cdots,k$ 代表要比較的兩個 trt 在排序中的差距。
+
+   | $q_{0.05}(2,12)$ | $q_{0.05}(3,12)$ | $q_{0.05}(4,12)$ |
+   | ---------------- | ---------------- | ---------------- |
+   | 3.05             | 3.77             | 4.20             |
+
+   將 $q_\alpha(p, df)$ 與 $S_{AB}$ 相乘得到 $SNK(p,0.05)$
+   |     | A    | D    | C    | B   |
+   | --- | ---- | ---- | ---- | --- |
+   | A   |      |      |      |     |
+   | D   | 0.22 |      |      |     |
+   | C   | 0.27 | 0.22 |      |     |
+   | B   | 0.30 | 0.27 | 0.22 |     |
+4. 將所有的差值與 $SNK(p,0.05)$ 進行比較，如果差值大於 $SNK(p,0.05)$ 則拒絕 $H_0:\mu_i=\mu_j$。
+
+   |     | A               | D               | C           | B   |
+   | --- | --------------- | --------------- | ----------- | --- |
+   | A   |                 |                 |             |     |
+   | D   | $0.13\not>0.22$ |                 |             |     |
+   | C   | $0.23\not>0.27$ | $0.10\not>0.22$ |             |     |
+   | B   | $0.49>0.30$     | $0.36>0.27$     | $0.26>0.22$ |     |
+
+得到結論：$A,D,C$ 的平均值沒有顯著差異，但 $B$ 的平均值顯著大於其他三個。
+
+使用 Tukey's Method 和 Scheffe's Method 則會得到不同的結論：$A,D,C$ 之間沒有顯著差異，$C,B$ 之間有顯著差異，但 $B$ 顯著大於 $A,D,C$。
+
+並且 $T_\alpha=0.30, S_{\alpha,cm}=\sqrt{\frac{MS_E*2}{4}}\sqrt{3\cdot F_{3,12,0.05}}=0.326$ 都是偏保守的檢定。
+
+---
+
+$$
+Y_{ij}=\mu+\tau_i+\varepsilon_{ij}\implies \tau_i\begin{cases}
+    \text{fixed}\\
+    \text{random}
+\end{cases}
+$$
+
+$\implies$ ANOVA for testing $H_0:$ No trt effect v.s. $H_1:$ At least one trt effect $\to H_0$ usually rejected.
+
+- $\tau_i$: fixed $\to$ contrast for detailed analysis
+- $\tau_i$: random $\to$ Variance components estimation problem. Basic way to do this is by ANOVA method.
+
+  $\implies$ solve for each variance component and the solution is an est for that variance component.
+
+  e.g. One-fator CRD (random model)
+
+  $$
+  \begin{gather*}
+    E(MS_E)=\sigma_\varepsilon^2\xlongequal{\text{set}}MS_E\quad E(MS_{trt})=\sigma_\varepsilon^2+n\sigma_\tau^2\xlongequal{\text{set}}MS_{trt}\\
+    \implies \hat{\sigma}_\varepsilon^2=MS_E\quad\hat{\sigma}_\tau^2=\frac{MS_{trt}-MS_E}{n}
+  \end{gather*}
+  $$
+
+  與 MOME 類似。估計量可能為負，在這種情況下，我們應該將估計量設為 0 (P510-511)。
+
diff --git a/math/experimental_designs/rcbd.md b/math/experimental_designs/rcbd.md
@@ -0,0 +1,94 @@
+# Randomized Complete Block Design (RCBD)
+
+這個實驗方法的策略是通過將實驗單元分組，消除組（Block）間可能出現的變異，以此來增加實驗的準確性。
+
+## One factor
+
+**EX**: 有 4 種牌子的輪胎：A,B,C,D，$Y=$ 跑 20000 公裡後的磨損量，我們想知道哪個牌子的輪胎最好。
+
+$\implies$ factor: 4 levels and is fixed
+
+- Design 1: 4 台車，每台車裝 1 種牌子的輪胎。
+
+  這是一個不好的設計，因為輪子品牌的效應與車的效應混在一起，具有強相關性。
+
+- Design 2 (CRD): 16 個輪胎完全隨機的分配到 4 台車的 4 個位置上。
+
+  $Y_{ij}=\mu+\tau_i+\varepsilon_{ij}$, 其中 $\tau_i$ 代表輪子的效應。收集到以下數據：
+
+  | Car1       | Car2       | Car3       | Car4       |
+  | ---------- | ---------- | ---------- | ---------- |
+  | $C\mid 12$ | $A\mid 14$ | $C\mid 10$ | $A\mid 13$ |
+  | $A\mid 17$ | $A\mid 13$ | $D\mid 11$ | $D\mid 9$  |
+  | $D\mid 11$ | $B\mid 14$ | $B\mid 14$ | $B\mid 8$  |
+  | $D\mid 14$ | $C\mid 12$ | $B\mid 13$ | $C\mid 9$  |
+
+  ANOVA:
+
+  | Source | SS    | df  | MS    | F    | p-value |
+  | ------ | ----- | --- | ----- | ---- | ------- |
+  | Brand  | 30.69 | 3   | 10.23 | 2.44 | 0.115   |
+  | Error  | 50.25 | 12  | 4.19  |      |         |
+  | Total  | 80.94 | 15  |       |      |         |
+
+  i.e. 四個牌子的輪胎的平均磨損量沒有顯著差異。
+
+  這個設計中同樣沒有控制車的效應。
+
+- Design 3 (RCBD): 為了消除因為車帶來的潛在的變異
+
+  | Car1       | Car2       | Car3       | Car4       |
+  | ---------- | ---------- | ---------- | ---------- |
+  | $C\mid 12$ | $A\mid 14$ | $C\mid 10$ | $A\mid 13$ |
+  | $A\mid 17$ | $A\mid 13$ | $D\mid 11$ | $D\mid 9$  |
+  | $D\mid 11$ | $B\mid 14$ | $B\mid 14$ | $B\mid 8$  |
+  | $D\mid 14$ | $C\mid 12$ | $B\mid 13$ | $C\mid 9$  |
+
+RCBD 由以下幾個部分組成：
+1. 每組包含所有的 trt。
+2. 在一個組中，trt 隨機分配到實驗單元上。
+
+設 trt 的數量為 $k$
+
+| Block 1  | Block 2  | $\cdots$ | Block b  |
+| -------- | -------- | -------- | -------- |
+| $\pi_1$  | $\pi_1$  | $\cdots$ | $\pi_1$  |
+| $\pi_2$  | $\pi_2$  | $\cdots$ | $\pi_2$  |
+| $\vdots$ | $\vdots$ | $\ddots$ | $\vdots$ |
+| $\pi_a$  | $\pi_a$  | $\cdots$ | $\pi_a$  |
+
+其中 $(\pi_1,\cdot,\pi_a)$ 是 $(1,\cdots,a)$ 的隨機排列。
+
+當我們得到具體數據：
+
+| Block 1   | Block 2   | $\cdots$ | Block b   |
+| --------- | --------- | -------- | --------- |
+| $Y_{11k}$ | $Y_{12k}$ | $\cdots$ | $Y_{14k}$ |
+| $Y_{21k}$ | $Y_{22k}$ | $\cdots$ | $Y_{24k}$ |
+| $\vdots$  | $\vdots$  | $\vdots$ | $\vdots$  |
+| $Y_{a1k}$ | $Y_{a2k}$ | $\cdots$ | $Y_{a4k}$ |
+
+並建模為：
+
+$$
+\begin{gather*}
+   Y_{ijk}=\mu+\tau_i+\beta_j+\varepsilon_{(ij)k} \\
+   i=1,\cdots,a,\quad j=1,\cdots,b,\quad k=1,\cdots,n\xlongequal{\text{usually}}1
+\end{gather*}
+$$
+
+- $\tau_i$: trt 的效應
+- $\beta_j$: block 的效應
+
+我們通常會假設 trt 與 block 直接沒有交互作用。並且 block effect 通常假設為 random effect，以這個例子來說，這樣假設可以將沒有實驗的車種的效應也納入考慮。
+
+$\implies$ 以上數據的 ANOVA(RCBD)：
+
+| Source | df  | SS    | MS   | F   | p-value         |
+| ------ | --- | ----- | ---- | --- | --------------- |
+| Brand  | 3   | 30.69 | 10.2 | 7.8 | $P(F_{3,9}>)=0$ |
+| Block  | 3   | 38.69 | 12.9 |     |                 |
+| Error  | 9   | 11.56 | 1.3  |     |                 |
+| Total  | 15  | 80.94 |      |     |                 |
+
+$\implies H_0: $ No brand effect 可以在 5% 的顯著水準下被拒絕。