diff --git a/DESCRIPTION b/DESCRIPTION index 572c9ee..6798141 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: unbiased -Title: Diverse Randomization Algorithms for Clinical Trials +Title: Unbiased: Production-Grade Randomization API Version: 1.0.0 Authors@R: c( person("Kamil", "Sijko", , "kamil.sijko@ttsi.com.pl", @@ -10,20 +10,14 @@ Authors@R: c( role = c("aut")), person("Łukasz", "Wałejko", , "lukasz.walejko@ttsi.com.pl", role = c("aut")), - person("Jagoda", "Głowacka-Walas", "jagoda.glowacka-walas@ttsi.com.pl", + person("Jagoda", "Głowacka-Walas", ,"jagoda.glowacka-walas@ttsi.com.pl", role = c("aut"), comment = c(ORCID = "0000-0002-7628-8691")), + person("Laura", "Bąkała", , role = c("aut")), person("Michał", "Seweryn", , "michal.seweryn@biol.uni.lodz.pl", - role = c("ctr"), comment = c(ORCID = "0000-0002-9090-3435")), + role = c("ctb"), comment = c(ORCID = "0000-0002-9090-3435")), person("Transition Technologies Science Sp. z o.o.", role = c("fnd", "cph")) ) -Description: The Unbiased package offers a comprehensive suite of randomization - algorithms for clinical trials, encompassing dynamic strategies like the - minimization method, simple randomization approaches, and block randomization - techniques. Its primary purpose is to provide a harmonized set of functions that - will seamlessly integrate with a production-ready plumber API, also contained - within the package. This integration is designed to facilitate a smooth and - efficient interface with electronic Case Report Form (eCRF) systems, enhancing - the capability of clinical trials to manage patient allocation. +Description: The Unbiased package delivers a minimization-based randomization algorithm for patient allocation in clinical trials, fully integrated with a production-ready API. It's designed to work seamlessly with a persistent PostgreSQL database, ensuring reliable data management and integrity. Packaged into precompiled Docker images, Unbiased simplifies deployment to just running docker-compose up, making it exceptionally straightforward to incorporate into your environment. License: MIT + file LICENSE Imports: checkmate, diff --git a/NAMESPACE b/NAMESPACE index 18be837..3b254dd 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,6 +1,5 @@ # Generated by roxygen2: do not edit by hand -export(create_db_connection_pool) export(randomize_minimisation_pocock) export(randomize_simple) export(run_unbiased) diff --git a/R/audit-trail.R b/R/audit-trail.R index e6932a4..658c62d 100644 --- a/R/audit-trail.R +++ b/R/audit-trail.R @@ -124,6 +124,7 @@ AuditLog <- R6::R6Class( # nolint: object_name_linter. #' #' @param pr A plumber router for which the audit trail is to be set up. #' @param endpoints A list of regex patterns for which the audit trail should be enabled. +#' @noRd #' @return Returns the updated plumber router with the audit trail hooks. #' @examples #' pr <- plumber::plumb("your-api-definition.R") |> diff --git a/R/db.R b/R/db.R index 7f53acb..4d76f43 100644 --- a/R/db.R +++ b/R/db.R @@ -8,8 +8,7 @@ #' between each attempt. #' #' @return A pool object representing the connection pool to the database. -#' @export -#' +#' @noRd #' @examples #' \dontrun{ #' pool <- create_db_connection_pool() diff --git a/R/error-handling.R b/R/error-handling.R index 3b8cea6..29ba8c6 100644 --- a/R/error-handling.R +++ b/R/error-handling.R @@ -9,6 +9,7 @@ globalCallingHandlers <- NULL # nolint #' It uses the sentryR package to set up Sentry based on environment variables. #' #' @param None +#' @noRd #' #' @return None. If the SENTRY_DSN environment variable is not set, the function will #' return a message and stop execution. diff --git a/R/randomize-minimisation-pocock.R b/R/randomize-minimisation-pocock.R index 1f01816..3406c41 100644 --- a/R/randomize-minimisation-pocock.R +++ b/R/randomize-minimisation-pocock.R @@ -6,6 +6,7 @@ #' #' @param all_patients data.frame with all patients #' @param new_patients data.frame with new patient +#' @noRd #' #' @return data.frame with columns as in all_patients and new_patients, #' filled with TRUE if there is match in covariate and FALSE if not diff --git a/README.md b/README.md index 0f4c5a7..072bd54 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,10 @@ By choosing **unbiased**, you're adopting a sophisticated approach to trial rand - [API Endpoints](#api-endpoints) - [Study Creation](#study-creation) - [Patient Randomization](#patient-randomization) + - [Study List](#study-list) + - [Study Details](#study-details) + - [Randomization List](#randomization-list) + - [Audit Log](#audit-log) 4. [Technical Implementation](#technical-implementation) - [Quality Assurance Measures](#quality-assurance-measures) - [Running Tests](#running-tests) @@ -104,7 +108,6 @@ unbiased::run_unbiased() This initiates the API server, by default, on your local machine (http://localhost:3838), making it accessible for interaction through various HTTP clients, including curl, Postman, or R's `httr` package. - # Getting started with **unbiased** The **unbiased** package offers functions for randomizing participants in clinical trials, ensuring a fair and transparent process. @@ -117,65 +120,64 @@ The **unbiased** API is designed to facilitate clinical trial management through - **Study Management**: Create and configure new studies, including specifying randomization parameters and treatment arms. - **Participant Randomization**: Dynamically randomize participants to treatment groups based on the study's configuration and existing participant data. +- **Study List**: List all previously defined studies. +- **Study Details**: Show details about the selected study. +- **Randomization List**: Generate a list of randomized patients for the selected study. +- **Audit Log**: Show a audit log for the selected study. ### Study Creation To initialize a study using Pocock's minimization method, use the POST /minimisation_pocock endpoint. The required JSON payload should detail the study, including treatment groups, allocation ratios, and covariates. -```R -# Initialize a study with Pocock's minimisation method -response <- request(api_url) |> - req_url_path("study", "minimisation_pocock") |> - req_method("POST") |> - req_body_json( - data = list( - identifier = "My_study_1", - name = "Study 1", - method = "var", - p = 0.85, - arms = list( - "placebo" = 1, - "treatment" = 1 - ), - covariates = list( - sex = list( - weight = 1, - levels = c("female", "male") - ), - age = list( - weight = 1, - levels = c("up to 50", "51 or more") - ) - ) - ) - ) -``` - -This call sets up the study and returns an ID for accessing further study-related endpoints. +This endpoint sets up the study and returns an ID for accessing further study-related endpoints. ### Patient Randomization The POST /{study_id}/patient endpoint assigns a new patient to a treatment group, requiring patient details and covariate information in the JSON payload. -```R -# Randomize a new patient -req_url_path("study", my_study_id, "patient") |> - req_method("POST") |> - req_body_json( - data = list( - current_state = - tibble::tibble( - "sex" = c("female"), - "age" = c("up to 50"), - "arm" = c("") - ) - ) - ) -``` - This endpoint determines the patient's treatment group. +### Study List + +The GET /study/ endpoint allow to list all previously defined studies. It returns information such as: + +- Study ID +- Identifier +- Name of study +- Randomization method +- Last edit date + +### Study Details +The GET /study/{study_id} endpoint allows to retrieve details about a selected study. The response body return: + +- Name of study +- Randomization method +- Last edit date +- Input parameters +- Strata + +### Randomization List +The GET /study/{study_id}/randomization_list endpoint allows to generate a list of randomized patients along with their assigned study arms. + +### Audit Log + +The GET /study/{study_id}/audit endpoint allows to print all records in the audit log for a selected study. +The response body includes the following information: + +- Log ID +- Creation date +- Type of event +- Request ID +- Study ID +- Endpoint URL +- Request method +- Request body with study definition +- Response code +- Response body with study details + +The endpoint facilitates tracking the history of requests sent to the database, along with their corresponding responses. This enables us to trace all actions involving the API. + # Technical details ## Running Tests diff --git a/man/compare_rows.Rd b/man/compare_rows.Rd deleted file mode 100644 index da314a0..0000000 --- a/man/compare_rows.Rd +++ /dev/null @@ -1,22 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/randomize-minimisation-pocock.R -\name{compare_rows} -\alias{compare_rows} -\title{Compare rows of two dataframes} -\usage{ -compare_rows(all_patients, new_patients) -} -\arguments{ -\item{all_patients}{data.frame with all patients} - -\item{new_patients}{data.frame with new patient} -} -\value{ -data.frame with columns as in all_patients and new_patients, -filled with TRUE if there is match in covariate and FALSE if not -} -\description{ -Takes dataframe all_patients (presumably with one row / patient) and -compares it to all rows of new_patients (presumably already randomized -patients) -} diff --git a/man/create_db_connection_pool.Rd b/man/create_db_connection_pool.Rd deleted file mode 100644 index 9a76532..0000000 --- a/man/create_db_connection_pool.Rd +++ /dev/null @@ -1,23 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/db.R -\name{create_db_connection_pool} -\alias{create_db_connection_pool} -\title{Defines methods for interacting with the study in the database -Create a database connection pool} -\usage{ -create_db_connection_pool(...) -} -\value{ -A pool object representing the connection pool to the database. -} -\description{ -This function creates a connection pool to a PostgreSQL database. It uses -environment variables to get the necessary connection parameters. If the -connection fails, it will retry up to 5 times with a delay of 2 seconds -between each attempt. -} -\examples{ -\dontrun{ -pool <- create_db_connection_pool() -} -} diff --git a/man/setup_audit_trail.Rd b/man/setup_audit_trail.Rd deleted file mode 100644 index 129039f..0000000 --- a/man/setup_audit_trail.Rd +++ /dev/null @@ -1,27 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/audit-trail.R -\name{setup_audit_trail} -\alias{setup_audit_trail} -\title{Set up audit trail} -\usage{ -setup_audit_trail(pr, endpoints = list()) -} -\arguments{ -\item{pr}{A plumber router for which the audit trail is to be set up.} - -\item{endpoints}{A list of regex patterns for which the audit trail should be enabled.} -} -\value{ -Returns the updated plumber router with the audit trail hooks. -} -\description{ -This function sets up an audit trail for a given process. It uses plumber's hooks to log -information before routing (preroute) and after serializing the response (postserialize). -} -\details{ -This function modifies the plumber router in place and returns the updated router. -} -\examples{ -pr <- plumber::plumb("your-api-definition.R") |> - setup_audit_trail() -} diff --git a/man/setup_sentry.Rd b/man/setup_sentry.Rd deleted file mode 100644 index 911f563..0000000 --- a/man/setup_sentry.Rd +++ /dev/null @@ -1,42 +0,0 @@ -% Generated by roxygen2: do not edit by hand -% Please edit documentation in R/error-handling.R -\name{setup_sentry} -\alias{setup_sentry} -\title{setup_sentry function} -\usage{ -setup_sentry() -} -\arguments{ -\item{None}{} -} -\value{ -None. If the SENTRY_DSN environment variable is not set, the function will -return a message and stop execution. -} -\description{ -This function is used to configure Sentry, a service for real-time error tracking. -It uses the sentryR package to set up Sentry based on environment variables. -} -\details{ -The function first checks if the SENTRY_DSN environment variable is set. If not, it -returns a message and stops execution. -If SENTRY_DSN is set, it uses the sentryR::configure_sentry function to set up Sentry with -the following parameters: -\itemize{ -\item dsn: The Data Source Name (DSN) is retrieved from the SENTRY_DSN environment variable. -\item app_name: The application name is set to "unbiased". -\item app_version: The application version is retrieved from the GITHUB_SHA environment variable. -If not set, it defaults to "unspecified". -\item environment: The environment is retrieved from the SENTRY_ENVIRONMENT environment variable. -If not set, it defaults to "development". -\item release: The release is retrieved from the SENTRY_RELEASE environment variable. -If not set, it defaults to "unspecified". -} -} -\examples{ -setup_sentry() - -} -\seealso{ -\url{https://docs.sentry.io/} -} diff --git a/man/unbiased-package.Rd b/man/unbiased-package.Rd index 3b046fa..446de3c 100644 --- a/man/unbiased-package.Rd +++ b/man/unbiased-package.Rd @@ -4,9 +4,9 @@ \name{unbiased-package} \alias{unbiased} \alias{unbiased-package} -\title{unbiased: Diverse Randomization Algorithms for Clinical Trials} +\title{unbiased: Unbiased: Production-Grade Randomization API} \description{ -The Unbiased package offers a comprehensive suite of randomization algorithms for clinical trials, encompassing dynamic strategies like the minimization method, simple randomization approaches, and block randomization techniques. Its primary purpose is to provide a harmonized set of functions that will seamlessly integrate with a production-ready plumber API, also contained within the package. This integration is designed to facilitate a smooth and efficient interface with electronic Case Report Form (eCRF) systems, enhancing the capability of clinical trials to manage patient allocation. +The Unbiased package delivers a minimization-based randomization algorithm for patient allocation in clinical trials, fully integrated with a production-ready API. It's designed to work seamlessly with a persistent PostgreSQL database, ensuring reliable data management and integrity. Packaged into precompiled Docker images, Unbiased simplifies deployment to just running docker-compose up, making it exceptionally straightforward to incorporate into your environment. } \seealso{ Useful links: @@ -23,12 +23,13 @@ Authors: \item Kinga Sałata \email{kinga.salata@ttsi.com.pl} \item Aleksandra Duda \email{aleksandra.duda@ttsi.com.pl} \item Łukasz Wałejko \email{lukasz.walejko@ttsi.com.pl} - \item Jagoda jagoda.glowacka-walas@ttsi.com.pl Głowacka-Walas (\href{https://orcid.org/0000-0002-7628-8691}{ORCID}) + \item Jagoda Głowacka-Walas \email{jagoda.glowacka-walas@ttsi.com.pl} (\href{https://orcid.org/0000-0002-7628-8691}{ORCID}) + \item Laura Bąkała } Other contributors: \itemize{ - \item Michał Seweryn \email{michal.seweryn@biol.uni.lodz.pl} (\href{https://orcid.org/0000-0002-9090-3435}{ORCID}) [contractor] + \item Michał Seweryn \email{michal.seweryn@biol.uni.lodz.pl} (\href{https://orcid.org/0000-0002-9090-3435}{ORCID}) [contributor] \item Transition Technologies Science Sp. z o.o. [funder, copyright holder] } diff --git a/vignettes/articles/minimization_randomization_comparison.Rmd b/vignettes/articles/minimization_randomization_comparison.Rmd index 9f09e44..758cbc4 100644 --- a/vignettes/articles/minimization_randomization_comparison.Rmd +++ b/vignettes/articles/minimization_randomization_comparison.Rmd @@ -1,28 +1,34 @@ --- -title: "Comparison of Minimization Randomization with Other Randomization Methods. Assessing the balance of covariates." -author: - - Aleksandra Duda, Jagoda Głowacka-Walas^[Tranistion Technologies Science] +title: "Benchmarking randomization methods" +author: "Aleksandra Duda^[Tranistion Technologies Science], Jagoda Głowacka-Walas^[Tranistion Technologies Science], Michał Seweryn^[Uniwersytet Łódzki]" date: "`r Sys.Date()`" output: html_document: toc: yes + toc_float: yes bibliography: references.bib link-citations: true --- + + ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE ) ``` -## Introduction +# Introduction Randomization in clinical trials is the gold standard and is widely considered the best design for evaluating the effectiveness of new treatments compared to alternative treatments (standard of care) or placebo. Indeed, the selection of an appropriate randomisation is as important as the selection of an appropriate statistical analysis for the study and the analysis strategy, whether based on randomisation or on a population model (@berger2021roadmap). -One of the primary advantages of randomization, particularly simple randomization (usually using flipping a coin method), is its ability to balance confounding variables across treatment groups. This is especially effective in large sample sizes (n > 200), where the random allocation of participants helps to ensure that both known and unknown confounders are evenly distributed between the study arms. This balanced distribution contributes significantly to the internal validity of the study, as it minimizes the risk of selection bias and confounding influencing the results (@lim2019randomization). +One of the primary advantages of randomization, particularly simple randomization (usually using flipping a coin method), is its ability to balance confounding variables across treatment groups. This is especially effective in large sample sizes (n \> 200), where the random allocation of participants helps to ensure that both known and unknown confounders are evenly distributed between the study arms. This balanced distribution contributes significantly to the internal validity of the study, as it minimizes the risk of selection bias and confounding influencing the results (@lim2019randomization). -It's important to note, however, that while simple randomization is powerful in large trials, it may not always guarantee an even distribution of confounding factors in trials with smaller sample sizes (n < 100). In such cases, the random allocation might result in imbalances in baseline characteristics between groups, which can affect the interpretation of the treatment's effectiveness. This potential limitation sets the stage for considering additional methods, such as stratified randomization, or dynamic minimization algorithms to address these challenges in smaller trials (@kang2008issues). +It's important to note, however, that while simple randomization is powerful in large trials, it may not always guarantee an even distribution of confounding factors in trials with smaller sample sizes (n \< 100). In such cases, the random allocation might result in imbalances in baseline characteristics between groups, which can affect the interpretation of the treatment's effectiveness. This potential limitation sets the stage for considering additional methods, such as stratified randomization, or dynamic minimization algorithms to address these challenges in smaller trials (@kang2008issues). This document provides a summary of the comparison of three randomization methods: simple randomization, block randomization, and adaptive randomization. Simple randomization and adaptive randomization (minimization method) are tools available in the `unbiased` package as `randomize_simple` and `randomize_minimisation_pocock` functions (@unbiased). The comparison aims to demonstrate the superiority of adaptive randomization (minimization method) over other methods in assessing the least imbalance of accompanying variables between therapeutic groups. Monte Carlo simulations were used to generate data, utilizing the `simstudy` package (@goldfeld2020simstudy). Parameters for the binary distribution of variables were based on data from the publication by @mrozikiewicz2023allogenic and information from researchers. @@ -42,31 +48,31 @@ library(tidyr) library(randomizeR) ``` -## The randomization methods considered for comparison +# The randomization methods considered for comparison In the process of comparing the balance of covariates among randomization methods, three randomization methods have been selected for evaluation: -- **simple randomization** - simple coin toss, algorithm that gives participants equal chances of being assigned to a particular arm. The method's advantage lies in its simplicity and the elimination of predictability. However, due to its complete randomness, it may lead to imbalance in sample sizes between arms and imbalances between prognostic factors. For a large sample size (n > 200), simple randomisation gives a similar number of generated participants in each group. For a small sample size (n < 100), it results in an imbalance (@kang2008issues). +- **simple randomization** - simple coin toss, algorithm that gives participants equal chances of being assigned to a particular arm. The method's advantage lies in its simplicity and the elimination of predictability. However, due to its complete randomness, it may lead to imbalance in sample sizes between arms and imbalances between prognostic factors. For a large sample size (n \> 200), simple randomisation gives a similar number of generated participants in each group. For a small sample size (n \< 100), it results in an imbalance (@kang2008issues). -- **block randomization** - a randomization method that takes into account defined covariates for patients. The method involves assigning patients to therapeutic arms in blocks of a fixed size, with the recommendation that the blocks have different sizes. This, to some extent, reduces the risk of researchers predicting future arm assignments. In contrast to simple randomization, the block method aims to balance the number of patients within the block, hence reducing the overall imbalance between arms (@rosenberger2015randomization). +- **block randomization** - a randomization method that takes into account defined covariates for patients. The method involves assigning patients to therapeutic arms in blocks of a fixed size, with the recommendation that the blocks have different sizes. This, to some extent, reduces the risk of researchers predicting future arm assignments. In contrast to simple randomization, the block method aims to balance the number of patients within the block, hence reducing the overall imbalance between arms (@rosenberger2015randomization). -- **adaptive randomization using minimization method** based on @pocock1975sequential algorithm - - this randomization approach aims to balance prognostic factors across treatment arms within a clinical study. It functions by evaluating the total imbalance of these factors each time a new patient is considered for the study. The minimization method computes the overall imbalance for each potential arm assignment of the new patient, considering factors like variance or other specified criteria. The patient is then assigned to the arm where their addition results in the smallest total imbalance. This assignment is not deterministic but is made with a predetermined probability, ensuring some level of randomness in arm allocation. This method is particularly useful in trials with multiple prognostic factors or in smaller studies where traditional randomization might fail to achieve balance. +- **adaptive randomization using minimization method** based on @pocock1975sequential algorithm - - this randomization approach aims to balance prognostic factors across treatment arms within a clinical study. It functions by evaluating the total imbalance of these factors each time a new patient is considered for the study. The minimization method computes the overall imbalance for each potential arm assignment of the new patient, considering factors like variance or other specified criteria. The patient is then assigned to the arm where their addition results in the smallest total imbalance. This assignment is not deterministic but is made with a predetermined probability, ensuring some level of randomness in arm allocation. This method is particularly useful in trials with multiple prognostic factors or in smaller studies where traditional randomization might fail to achieve balance. -## Assessment of covariate balance +# Assessment of covariate balance In the proposed approach to the assessment of randomization methods, the primary objective is to evaluate each method in terms of achieving balance in the specified covariates. The assessment of balance aims to determine whether the distributions of covariates are similarly balanced in each therapeutic group. Based on the literature, standardized mean differences (SMD) have been employed for assessing balance (@berger2021roadmap). The SMD method is one of the most commonly used statistics for assessing the balance of covariates, regardless of the unit of measurement. It is a statistical measure for comparing differences between two groups. The covariates in the examined case are expressed as binary variables. In the case of categorical variables, SMD is calculated using the following formula (@zhang2019balance): -\[ SMD = \frac{{p_1 - p_2}}{{\sqrt{\frac{{p_1 \cdot (1 - p_1) + p_2 \cdot (1 - p_2)}}{2}}}} \], +$$ SMD = \frac{{p_1 - p_2}}{{\sqrt{\frac{{p_1 \cdot (1 - p_1) + p_2 \cdot (1 - p_2)}}{2}}}} $$, where: -- \( p_1 \) is the proportion in the first arm, +- $p_1$ is the proportion in the first arm, -- \( p_2 \) is the proportion in the second arm. +- $p_2$ is the proportion in the second arm. -## Definied number of patients and number of iterations +# Definied number of patients In this simulation, we are using a real use case - the planned FootCell study - non-commercial clinical research in the area of civilisation diseases - to guide our data generation process. For the FootCell study, it is anticipated that a total of 105 patients will be randomized into the trial. These patients will be equally divided among three research groups - Group A, Group B, and Group C - with each group comprising 35 patients. @@ -75,7 +81,7 @@ In this simulation, we are using a real use case - the planned FootCell study - n <- 105 ``` -## Defining parameters for Monte-Carlo simulation +# Defining parameters for Monte-Carlo simulation The distribution of parameters for individual covariates, which will subsequently be used to validate randomization methods, has been defined using the publication @mrozikiewicz2023allogenic on allogenic interventions.. @@ -83,17 +89,17 @@ The publication describes the effectiveness of comparing therapy using ADSC (Adi In the process of defining the study for randomization, the following covariates have been selected: -- **gender** [male/female], +- **gender** [male/female], -- **diabetes type** [type I/type II], +- **diabetes type** [type I/type II], -- **HbA1c** [up to 9/9 to 11] [%], +- **HbA1c** [up to 9/9 to 11] [%], -- **tpo2** [up to 50/above 50] [mmHg], +- **tpo2** [up to 50/above 50] [mmHg], -- **age** [up to 55/above 55] [years], +- **age** [up to 55/above 55] [years], -- **wound size** [up to 2/above 2] [cm\(^2\)]. +- **wound size** [up to 2/above 2] [cm$^2$]. In the case of the variables gender and diabetes type in the publication @mrozikiewicz2023allogenic, they were expressed in the form of frequencies. The remaining variables were presented in terms of measures of central tendency along with an indication of variability, as well as minimum and maximum values. To determine the parameters for the binary distribution, the truncated normal distribution available in the `truncnorm` package was utilized. The truncated normal distribution is often used in statistics and probability modeling when dealing with data that is constrained to a certain range. It is particularly useful when you want to model a random variable that cannot take values beyond certain limits (@burkardt2014truncated). @@ -137,9 +143,9 @@ data.frame( gt() ``` -## Generate data using Monte-Carlo simulations +# Generate data using Monte-Carlo simulations -Monte-Carlo simulations were used to accumulate the data. This method is designed to model variables based on defined parameters. Variables were defined using the `simstudy` package, utilizing the `defData` function (@goldfeld2020simstudy). As all variables specify proportions, `dist = 'binary'` was used to define the variables. Due to the likely association between the type of diabetes and age – meaning that the older the patient, the higher the probability of having type II diabetes – a relationship with diabetes was established when defining the `age` variable using a logit function `link = "logit"`. The proportions for gender and diabetes were defined by the researchers and were consistent with the literature @mrozikiewicz2023allogenic. +Monte-Carlo simulations were used to accumulate the data. This method is designed to model variables based on defined parameters. Variables were defined using the `simstudy` package, utilizing the `defData` function (@goldfeld2020simstudy). As all variables specify proportions, `dist = 'binary'` was used to define the variables. Due to the likely association between the type of diabetes and age -- meaning that the older the patient, the higher the probability of having type II diabetes -- a relationship with diabetes was established when defining the `age` variable using a logit function `link = "logit"`. The proportions for gender and diabetes were defined by the researchers and were consistent with the literature @mrozikiewicz2023allogenic. Using `genData` function from `simstudy` package, a data frame (**data**) was generated with an artificially adopted variable `arm`, which will be filled in by subsequent randomization methods in the arm allocation process for all `n` patients. @@ -191,18 +197,18 @@ head(data, 5) |> gt() ``` -## Minimization randomization +# Minimization randomization To generate appropriate research arms, a function called `minimize_results` was written, utilizing the `randomize_minimisation_pocock` function available within the `unbiased` package (@unbiased). The probability parameter was set at the level defined within the function (p = 0.85). In the case of minimization randomization, to verify which type of minimization (with equal weights or unequal weights) was used, three calls to the minimize_results function were prepared: -- **minimize_equal_weights** - each covariate weight takes a value equal to 1 divided by the number of covariates. In this case, the weight is 1/6, +- **minimize_equal_weights** - each covariate weight takes a value equal to 1 divided by the number of covariates. In this case, the weight is 1/6, -- **minimize_unequal_weights** - following the expert assessment by physicians, parameters with potentially significant impact on treatment outcomes (hba1c, tpo2, wound size) have been assigned a weight of 2. The remaining covariates have been assigned a weight of 1. +- **minimize_unequal_weights** - following the expert assessment by physicians, parameters with potentially significant impact on treatment outcomes (hba1c, tpo2, wound size) have been assigned a weight of 2. The remaining covariates have been assigned a weight of 1. -- **minimize_unequal_weights_3** - following the expert assessment by physicians, parameters with potentially significant impact on treatment outcomes (hba1c, tpo2, wound size) have been assigned a weight of 3. The remaining covariates have been assigned a weight of 1. +- **minimize_unequal_weights_3** - following the expert assessment by physicians, parameters with potentially significant impact on treatment outcomes (hba1c, tpo2, wound size) have been assigned a weight of 3. The remaining covariates have been assigned a weight of 1. The tables present information about allocations for the first 5 patients. - + ```{r, minimize-results} # drawing an arm for each patient minimize_results <- @@ -277,7 +283,7 @@ head(minimize_unequal_weights_3, 5) |> gt() ``` -The `statistic_table` function was developed to provide information on: the distribution of the number of patients across research arms, and the distribution of covariates across research arms, along with p-value information for statistical analyses used to compare proportions - chi^2, and the exact Fisher's test, typically used for small samples. +The `statistic_table` function was developed to provide information on: the distribution of the number of patients across research arms, and the distribution of covariates across research arms, along with p-value information for statistical analyses used to compare proportions - chi\^2, and the exact Fisher's test, typically used for small samples. The function relies on the use of the `tbl_summary` function available in the `gtsummary` package (@gtsummary). @@ -307,25 +313,25 @@ statistics_table <- The table presents a statistical summary of results for the first iteration for: -- **Minimization with all weights equal to 1/6**. +- **Minimization with all weights equal to 1/6**. ```{r, chi2-1, tab.cap = "Summary of proportion test for minimization randomization with equal weights"} statistics_table(minimize_equal_weights) ``` -- **Minimization with weights 2:1**. +- **Minimization with weights 2:1**. ```{r, chi2-2, tab.cap = "Summary of proportion test for minimization randomization with equal weights"} statistics_table(minimize_unequal_weights) ``` -- **Minimization with weights 3:1**. +- **Minimization with weights 3:1**. ```{r, chi2-3, tab.cap = "Summary of proportion test for minimization randomization with equal weights"} statistics_table(minimize_unequal_weights_3) ``` -## Simple randomization +# Simple randomization In the next step, appropriate arms were generated for patients using simple randomization, available through the `unbiased` package - the `randomize_simple` function (@unbiased). The `simple_results` function was called within `simple_data`, considering the initial assumption of assigning patients to three arms in a 1:1:1 ratio. @@ -362,9 +368,9 @@ head(simple_data, 5) |> statistics_table(simple_data) ``` -## Block randomization +# Block randomization -Block randomization, as opposed to minimization and simple randomization methods, was developed based on the `rbprPar` function available in the `randomizeR` package (@randomizeR). Using this, the `block_rand` function was created, which, based on the defined number of patients, arms, and a list of stratifying factors, generates a randomization list with a length equal to the number of patients multiplied by the product of categories in each covariate. In the case of the specified data in the document, for one iteration, it amounts to **105 * 2^6 = 6720 rows**. This ensures that there is an appropriate number of randomisation codes for each opportunity. In the case of equal characteristics, it is certain that there are the right number of codes for the defined `n` patients. +Block randomization, as opposed to minimization and simple randomization methods, was developed based on the `rbprPar` function available in the `randomizeR` package (@randomizeR). Using this, the `block_rand` function was created, which, based on the defined number of patients, arms, and a list of stratifying factors, generates a randomization list with a length equal to the number of patients multiplied by the product of categories in each covariate. In the case of the specified data in the document, for one iteration, it amounts to **105 \* 2\^6 = 6720 rows**. This ensures that there is an appropriate number of randomisation codes for each opportunity. In the case of equal characteristics, it is certain that there are the right number of codes for the defined `n` patients. Based on the `block_rand` function, it is possible to generate a randomisation list, based on which patients will be allocated, with characteristics from the output `data` frame. Due to the 3 arms and the need to blind the allocation of consecutive patients, block sizes 3,6 and 9 were used for the calculations. @@ -462,7 +468,7 @@ head(block_data, 5) |> statistics_table(block_data) ``` -## Generate 1000 simulations +# Generate 1000 simulations We have performed 1000 iterations of data generation with parameters defined above. The number of iterations indicates the number of iterations included in the Monte-Carlo simulations to accumulate data for the given parameters. This allowed for the generation of data 1000 times for 105 patients to more efficiently assess the effect of randomization methods in the context of covariate balance. @@ -480,7 +486,7 @@ These data were assigned to the variable `sim_data` based on the data stored in sim_data <- readRDS("1000_sim_data.Rds") ``` -## Check balance using smd test +# Check balance using smd test In order to select the test and define the precision at a specified level, above which we assume no imbalance, a literature analysis was conducted based on publications such as @lee2021estimating, @austin2009balance, @doah2021impact, @brown2020novel, @nguyen2017double, @sanchez2003effect, @lee2022propensity, @berger2021roadmap. @@ -490,7 +496,7 @@ In the literature analysis, the precision level ranged between 0.1-0.2. For smal In the analyzed example, due to the sample size of 105 patients, a threshold of 0.2 for the SMD test was adopted. -A function called `smd_covariants_data` was written to generate frames that produce the SMD test for each covariate in each iteration, utilizing the `CreateTableOne` function available in the `tableone` package (@tableone). In cases where the test result is <0.001, a value of 0 was assigned. +A function called `smd_covariants_data` was written to generate frames that produce the SMD test for each covariate in each iteration, utilizing the `CreateTableOne` function available in the `tableone` package (@tableone). In cases where the test result is \<0.001, a value of 0 was assigned. The results for each randomization method were stored in the `cov_balance_data`. @@ -556,7 +562,7 @@ cov_balance_data <- Below are the results of the SMD test presented in the form of boxplot and violin plot, depicting the outcomes for each randomization method. The red dashed line indicates the adopted precision threshold. -- **Boxplot of the combined results** +- **Boxplot of the combined results** ```{r, boxplot, fig.cap= "Summary average smd in each randomization methods", warning=FALSE, fig.width=9, fig.height=6} # boxplot @@ -571,7 +577,7 @@ cov_balance_data |> theme_bw() ``` -- **Violin plot** +- **Violin plot** ```{r, violinplot, fig.cap= "Summary smd in each randomization methods in each covariants", warning = FALSE, fig.width=9, fig.height=6} # violin plot @@ -588,7 +594,7 @@ cov_balance_data |> theme(axis.text = element_text(angle = 45, vjust = 0.5, hjust = 1)) ``` -- **Summary table of success** +- **Summary table of success** Based on the specified precision threshold of 0.2, a function defining randomization success, named `success_power`, was developed. If the SMD test value for each covariate in a given iteration is above 0.2, the function defines the analysis data as 'failure' - 0; otherwise, it is defined as 'success' - 1. @@ -628,9 +634,9 @@ success_power(cov_balance_data) |> gt() ``` -## Conclusion +# Conclusion -Considering all three randomization methods: minimization, block randomization, and simple randomization, minimization performs the best in terms of covariate balance. Simple randomization has a significant drawback, as patient allocation to arms occurs randomly with equal probability. This leads to an imbalance in both the number of patients and covariate balance, which is also random. This is particularly the case with small samples. Balancing the number of patients is possible for larger samples for n > 200. +Considering all three randomization methods: minimization, block randomization, and simple randomization, minimization performs the best in terms of covariate balance. Simple randomization has a significant drawback, as patient allocation to arms occurs randomly with equal probability. This leads to an imbalance in both the number of patients and covariate balance, which is also random. This is particularly the case with small samples. Balancing the number of patients is possible for larger samples for n \> 200. On the other hand, block randomization performs very well in balancing the number of patients in groups in a specified allocation ratio. However, compared to adaptive randomisation using the minimisation method, block randomisation has a lower probability in terms of balancing the co-variables. @@ -640,4 +646,4 @@ Minimization method, provides the highest success power by ensuring balance acro --- nocite: '@*' -... +---