Skip to content

Commit

Permalink
Correct weights vignette building by loading images
Browse files Browse the repository at this point in the history
  • Loading branch information
EeethB committed Sep 11, 2023
1 parent c89b37e commit 656c9a1
Show file tree
Hide file tree
Showing 8 changed files with 83 additions and 52 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ BugReports: https://github.com/Gilead-BioStats/graphicalMCP/issues
Imports:
igraph,
matrixStats,
mvtnorm
mvtnorm,
stats
Suggests:
bench,
dplyr,
Expand Down
2 changes: 1 addition & 1 deletion R/graph_rejection_orderings.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ graph_rejection_orderings <- function(shortcut_test_result) {
1,
function(row) {
if (length(unique(row)) == length(row)){
setNames(row, hyp_names[row])
stats::setNames(row, hyp_names[row])
} else {
NULL
}
Expand Down
Binary file removed vignettes/cache/gw_benchmark_list.rds
Binary file not shown.
Binary file removed vignettes/cache/power_benchmark_list.rds
Binary file not shown.
126 changes: 76 additions & 50 deletions vignettes/generate-closure.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ library(gt)

## What is the closure?

The closure of a graph is the set of all sub-graphs, along with their weights calculated according to algorithm 1 of Bretz et al (2011). It is primarily used for [closed testing](link to closed test vignette), where all sub-graphs are tested for significance, and results are aggregated to determine which null hypotheses are significant globally.
The closure of a graph is the set of all sub-graphs, along with their weights calculated according to algorithm 1 of Bretz et al (2011). It is primarily used for [closed testing](link%20to%20closed%20test%20vignette), where all sub-graphs are tested for significance, and results are aggregated to determine which null hypotheses are significant globally.

Throughout this article a common example will be used for demonstrations - the simple successive graph. It has two primary hypotheses, $H_1$ and $H_2$, which have the initial weight evenly split between them. The secondary hypotheses, $H_3$ and $H_4$, only have weight propagated to them if $H_1$ or $H_2$ is deleted, respectively.

Expand Down Expand Up @@ -149,7 +149,7 @@ data.frame(Intersection = seq_len(nrow(weighting_strategy))) |>
)
```

#### Binary counting
#### Binary counting {#binary-counting}

The second useful property is somewhat a re-framing of the first, or perhaps could be viewed as a reason why the first is true. Starting with the bottom row, the powerset in this particular order counts up from 1 in binary, incrementing by 1 per row. This means that a row number can be directly calculated from a vector showing which hypotheses are currently deleted from the graph: `row_number == number_of_rows - incl_excl_vec_converted_to_base_10 + 1`. For example, intersection number 6 has hypothesis vector `1010`. When interpreted as binary, this is `1 * 8 + 0 * 4 + 1 * 2 + 0 * 1 = 10` in base 10, and `6 == 15 - 10 + 1`.

Expand All @@ -174,9 +174,9 @@ data.frame(Intersection = seq_len(nrow(weighting_strategy))) |>

Because the size of the closure grows quickly as graph size increases (An n-graph has `2^n - 1` sub-graphs), calculating the full closure for large graphs can be computationally intensive. Optimizing this process led to three main strategies:

- The simplest approach, which uses the full graph as the starting point for every sub-graph, then deletes the appropriate hypotheses
- A recursive method, which traverses the closure tree, deleting one hypothesis each time to step between graphs
- A formulaic shortcut using the order of graphs generated with the recursive method
- The simplest approach, which uses the full graph as the starting point for every sub-graph, then deletes the appropriate hypotheses
- A recursive method, which traverses the closure tree, deleting one hypothesis each time to step between graphs
- A formulaic shortcut using the order of graphs generated with the recursive method

Note that the discussion of these methods focuses primarily on the weights side rather than the powerset side of the closure. This is because the fastest methods discovered generate the powerset implicitly from missing values in the weights side. For methods such as the simple approach, which rely on having the powerset in order to generate weights, the powerset can be created efficiently. Here `num_hyps` refers to the number of hypotheses in the initial graph.

Expand Down Expand Up @@ -351,11 +351,11 @@ The formula shortcut also results in simpler code than the recursive solution.

The formula method reduces each step of generating the closure to a single deletion from a prior graph, with almost no additional overhead. Here's how the different methods fare, including the version from the gMCP package for reference. Also worthy of note is the lrstat package, which contains a few MCP-related functions, including generating weights of the closure. It uses excellent C++ code to perform even faster, but since the speed of the current formula-based method is acceptable, adding an Rcpp dependency was not considered to be worth the additional time savings.

```{r gw-benchmarks, echo=TRUE}
```{r gw-benchmarks, eval=FALSE}
vec_num_hyps <- 2:8 * 2
if (file.exists(here::here("vignettes/cache/gw_benchmark_list.rds"))) {
benchmark_list <- readRDS(here::here("vignettes/cache/gw_benchmark_list.rds"))
benchmarks <- read.csv(here::here("vignettes/cache/gw_benchmarks.csv"))
} else {
benchmark_list <- lapply(
vec_num_hyps,
Expand Down Expand Up @@ -387,34 +387,41 @@ if (file.exists(here::here("vignettes/cache/gw_benchmark_list.rds"))) {
memory = FALSE,
time_unit = "ms",
min_iterations = 5
)[, c(-6, -10:-13)]
)[, 1:5]
# Remove rows for lrstat that weren't actually run
benchmark <- benchmark[benchmark$median > .002, ]
benchmark$char_expression <- as.character(benchmark$expression)
benchmark <- benchmark[, c("char_expression", "median")]
cbind(data.frame(num_hyps = num_hyps), benchmark)
}
)
saveRDS(benchmark_list, here::here("vignettes/cache/gw_benchmark_list.rds"))
}
benchmarks <- do.call(rbind, benchmark_list)
benchmarks$char_expression <- ordered(
benchmarks$char_expression,
c(
"gMCP",
"graphicalMCP simple",
"graphicalMCP recursive",
"graphicalMCP shortcut",
"lrstat"
benchmarks <- do.call(rbind, benchmark_list)
benchmarks$char_expression <- ordered(
benchmarks$char_expression,
c(
"gMCP",
"graphicalMCP simple",
"graphicalMCP recursive",
"graphicalMCP shortcut",
"lrstat"
)
)
)
write.csv(
benchmarks,
here::here("vignettes/cache/gw_benchmarks.csv"),
row.names = FALSE
)
# saveRDS(benchmark_list, here::here("vignettes/cache/gw_benchmark_list.rds"))
}
```

```{r plot-gw-benchmarks, echo=TRUE, fig.dim=c(8, 5)}
```{r plot-gw-benchmarks, eval=FALSE, fig.dim=c(8, 5)}
benchmarks_plot_standard <-
ggplot(benchmarks, aes(num_hyps, median, color = char_expression)) +
geom_point() +
Expand All @@ -435,6 +442,8 @@ benchmarks_plot_standard +
labs(subtitle = "Log-10 median runtime")
```

![](img/gw-benchmarks-plot.png)

### Power simulations

While the time savings on the closure are nice, on most graphs the savings will not make a big difference compared to the longer run-times on e.g. power simulations. However, paying attention to the closure is important for other reasons too, such as improving the power algorithm.
Expand All @@ -443,30 +452,35 @@ While the time savings on the closure are nice, on most graphs the savings will

The typical method for running a Bonferroni shortcut procedure on a graph is to:

1. Search a graph for a single hypothesis which can be rejected
1. Delete the rejected hypothesis and update weights
1. Repeat until there are no more significant hypotheses
1. Search a graph for a single hypothesis which can be rejected
2. Delete the rejected hypothesis and update weights
3. Repeat until there are no more significant hypotheses

Running this process in either R or a low-level language like C is fast for a single procedure, but when it's run 100,000 times for a power simulation, the R version becomes onerous. However, there's a lot of duplication in a power simulation using this method. In many of the simulations, the same steps will be taken, which means re-calculating the same set of weights many times.

#### Closure shortcut

The [binary counting](#binary-counting) property of the closure admits a shortcut which can be implemented in R to get a scalable competitor to the typical algorithm:

1. Generate the closure a single time to get all sub-graph weights efficiently
1. For each simulation:
1. Search a graph for all hypotheses which can be rejected - This is fast with vectorization
1. Using the binary counting property, index into the row of the closure corresponding to all hypotheses rejected so far to get updated weights - This is substantially faster than updating a graph and re-calculating weights, especially for larger graphs
1. Repeat using the updated weights

1. Generate the closure a single time to get all sub-graph weights efficiently
2. For each simulation: 1. Search a graph for all hypotheses which can be rejected - This is fast with vectorization 1. Using the binary counting property, index into the row of the closure corresponding to all hypotheses rejected so far to get updated weights - This is substantially faster than updating a graph and re-calculating weights, especially for larger graphs 1. Repeat using the updated weights

While this method is not as fast as gMCP's C implementation, it still runs substantial power simulations in a matter of seconds. The biggest drawback is how it can scale differently for different graph structures. For instance a fixed sequence procedure can take longer than a more balanced graph, like a Holm procedure, because it takes more steps to reject all possible hypotheses.

```{r power-benchmarks, echo=TRUE}
```{r power-benchmarks, eval=FALSE}
vec_num_hyps <- 2:8 * 2
if (file.exists(here::here("vignettes/cache/power_benchmark_list.rds"))) {
benchmark_list <-
readRDS(here::here("vignettes/cache/power_benchmark_list.rds"))
benchmarks <- read.csv(here::here("vignettes/cache/power_benchmarks.csv"))
benchmarks$char_expression <- ordered(
benchmarks$char_expression,
c(
"graphicalMCP fixed sequence (R)",
"graphicalMCP Holm (R)",
"gMCP (C)"
)
)
} else {
benchmark_list <- lapply(
vec_num_hyps,
Expand Down Expand Up @@ -497,32 +511,42 @@ if (file.exists(here::here("vignettes/cache/power_benchmark_list.rds"))) {
memory = FALSE,
time_unit = "s",
min_iterations = 5
)[, c(-6, -10:-13)]
)[, 1:5]
# Remove rows for gMCP closure that weren't actually run
benchmark <- benchmark[benchmark$median > .00001, ]
benchmark$char_expression <- as.character(benchmark$expression)
benchmark <- benchmark[, c("char_expression", "median")]
cbind(data.frame(num_hyps = num_hyps), benchmark)
}
)
}
saveRDS(benchmark_list, here::here("vignettes/cache/power_benchmark_list.rds"))
benchmarks <- do.call(rbind, benchmark_list)
benchmarks$char_expression <- ordered(
benchmarks$char_expression,
c(
"graphicalMCP fixed sequence (R)",
"graphicalMCP Holm (R)",
"gMCP (C)"
benchmarks <- do.call(rbind, benchmark_list)
benchmarks$char_expression <- ordered(
benchmarks$char_expression,
c(
"graphicalMCP fixed sequence (R)",
"graphicalMCP Holm (R)",
"gMCP (C)"
)
)
)
write.csv(
benchmarks,
here::here("vignettes/cache/power_benchmarks.csv"),
row.names = FALSE
)
# saveRDS(
# benchmark_list,
# here::here("vignettes/cache/power_benchmark_list.rds")
# )
}
```

```{r plot-power-benchmarks, echo=TRUE, fig.dim=c(8, 5)}
```{r plot-power-benchmarks, eval=FALSE, fig.dim=c(8, 5)}
benchmarks_plot_standard <-
ggplot(benchmarks, aes(num_hyps, median, color = char_expression)) +
geom_point() +
Expand All @@ -542,3 +566,5 @@ benchmarks_plot_standard +
scale_y_log10(labels = scales::label_comma(suffix = "s")) +
labs(subtitle = "Log-10 median runtime")
```

![](img/power-benchmarks-plot.png)
4 changes: 4 additions & 0 deletions vignettes/graph-examples.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ library(graphicalMCP)

# Introduction

## Multiple comparison procedures

In confirmatory clinical trials with multiple endpoints,

Any valid graph can be made with `graph_create()`, but some patterns have been established over time. We demonstrate how to run some of these patterns with graphicalMCP.

# Bonferroni-Holm
Expand Down
Binary file added vignettes/img/gw-benchmarks-plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/img/power-benchmarks-plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 656c9a1

Please sign in to comment.