Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialize package with std_lm #2

Closed
wants to merge 9 commits into from
Closed

initialize package with std_lm #2

wants to merge 9 commits into from

Conversation

clarkliming
Copy link
Collaborator

@clarkliming clarkliming commented Jul 5, 2023

close #1
here to illustrate the package structure, std_lm is included here; this is actually the same as lm but with robust covariance matrix (with summary the result is printed to the console)

this PR is not completed yet but provide a framework of how collaboration should happen

@clarkliming
Copy link
Collaborator Author

hi team, I add a basic framework for these analysis.

In general, there will be a wrapper around the lm, glm for the standardization methods, and robust covariance will be used. treatment variable have to be specified explicitly and only two levels is allowed (otherwise we can not do the standardization correctly).

example

std_lm(Sepal.Length ~ Species, data = subset(iris, Species != "virginica"), trt = "Species")

another possibility is to use some special functions in formula

std_lm(Sepal.Length ~ trt(Species), data = subset(iris, Species != "virginica"))

iptw is not implemented yet, but should be similar.

any ideas on the basic structure of the package?

@xinzhn
Copy link
Collaborator

xinzhn commented Jul 13, 2023

Hi Liming,

For unconditional treatment effect of linear models, I suggest that we follow the recommendation in the FDA final guidance as quoted below.

  • Nominal standard errors are often the default method in most statistical software packages. Even if the model is incorrectly specified, they are acceptable in two arm trials with 1:1 randomization. However, in other settings, these standard errors can be inaccurate when the model is misspecified. Therefore, the Agency recommends that sponsors consider use of a robust standard error method such as the Huber-White “sandwich” standard error when the model does not include treatment by covariate interactions (Rosenblum and van der Laan2009; Lin 2013). Other robust standard error methods proposed in the literature can also cover cases with interactions (Ye et al. 2022). An appropriate nonparametric bootstrap procedure can also be used (Efron and Tibshirani 1993)..

Accordingly, for asymptotic standard error, it should provide:

  1. Nominal standard errors and maybe, robust sandwich standard errors as well, if it is two arms with 1:1 randomization; we should also suggest the user not including any treatment by covariate interactions as the one without such interactions is optimal in this case1.
  2. Robust sandwich standard error (already implemented) for other cases when NOT including treatment by covariate interactions.
  3. The estimated values using ANHECOVA and associated robust standard error 2 when including treatment by covariate interactions.

For the input of std_lm, I suggest adding options for whether to 1) include treatment by covariate interactions in the regression model and 2) to account for stratified randomization for standard error estimation. To include those interactions, the simplest way is to fit separate regression models to each treatment group23.

For the output of std_lm, I suggest including a vector of mean outcomes for all treatment groups and its estimated covariance matrix. The unconditional treatment effect for difference, ratio and odds ratio (for binary) can be obtain by another function with the output of std_lm, where its standard error can be computed using the delta method based on the estimated covariance matrix from std_lm. This can save the effort to develop separate function for different summary measures. We can follow a similar way to develop other estimators for unconditional treatment with a similar interface.

For IPTW, we may consider to build based on the PWS package, which used the standard error estimator from the paper4 cited by the FDA final guidance.

Look forward to hearing thoughts from you and other people.

Footnotes

  1. Lin, W. (2013), “Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique,” The Annals of Applied Statistics, 7, 295–318. https://doi.org/10.1214/12-aoas583.

  2. Ting Ye, Jun Shao, Yanyao Yi & Qingyuan Zhao (2022) Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials, Journal of the American Statistical Association, DOI: 10.1080/01621459.2022.2049278 2

  3. Tsiatis, A.A., Davidian, M., Zhang, M. and Lu, X. (2008), Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statist. Med., 27: 4658-4677. https://doi.org/10.1002/sim.3113

  4. Williamson, E.J., Forbes, A. and White, I.R. (2014), Variance reduction in randomised trials by inverse probability weighting using the propensity score. Statist. Med., 33: 721-737. https://doi.org/10.1002/sim.5991


# Implementation of standardization method for linear models

The function name is `lm_std`, and the arguments are quite similar to `lm`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General idea: any possibility that we start from lm result and then pipe this to a package function to get the covariate adjustment on top?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be, but we lose some general consistency with other methods, like for standardization methods, we usually need to modify the data and create counter factual treatment and predict the results (sometimes also need to check if the data is of correct structure, like binary treatment) , like iptw we need provide weights obtained from the probability of treatment (other methods we don't include weights); in addition, when it comes to some new methods, then we still need a new interface of the regression. So to make other covariate adjustment methods consistent, it might be good to create some wrappers like this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. makes sense. Then a consistent prefix with std_ will be nice

Copy link

@bailliem bailliem Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the class of method coming first in the prefix if that works for others i.e.

  • std_ <model/method>_ for standisation
  • ipw _ <model/method> _ for inverse weighting ,etc.

This would follow the Morris et al. overview paper / classification of methods.

It feels intuitive, especially if using autocomplete i.e. searching within the method class...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any possibility that we start from lm result and then pipe this to a package function

I like this idea a lot. I'm just getting up-to-speed trying to understand the package's interface goals - apologies if I'm over-simplifying things to make this work. Would something like this be feasible:

data %>%
  cov_adjust(by = "Species") %>%
  lm(Sepal.Length ~ Species)  # calls lm.covadj_spec with signature lm(<covadj_spec>, formula)

data %>%
  cov_adjust(by = "Species", weights = <weights>) %>%
  lm(Sepal.Length ~ Species)

I could see this being more comfortable if the basic parameters are reused across many methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting idea @dgkf ! @clarkliming what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds good but may lead to other confusions, like standardization methods is not actually a linear model, or standardized glm is not a glm; they focus on estimating the treatment effect. so using the lm as generic can be a little inappropriate I think; but a similar grammar may be adopted to define the treatment variable (anyway, the treatment effect is the key)

@clarkliming
Copy link
Collaborator Author

Hi @xinzhn, there are some updates to the current design, following your suggestions

  1. checks are added to warn about treatment*covariate interaction terms (ANHECOVA not implemented yet so will report a warning here)
  2. the original fit is stored so whenever you want you can still access the original data
  3. added a generic called treatment_effect which would use "vcov_method" for robust standard error (here use "constant" to obtain the nominal standard errors), and use trt + ref to specify the treatment effect of trt compared to ref

@bailliem the namings are updated and in another branch the std_glm is also added. Other method to be added later, but should be following the same naming conventions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

design the package with standardize linear regression with robust covariance
5 participants