-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add parametric.py module, add plot_regresion_profiles to plot.py, and update plot_als_comparison.py example #148
Open
earoy
wants to merge
8
commits into
yeatmanlab:main
Choose a base branch
from
earoy:setup-parametric-module-with-plotting
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 7 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
0296cf7
Add parametric.py module, add plot_regresion_profiles to plot.py, and…
earoy 8d9a1ce
update formatting
earoy 488c1f1
fix formatting errors
earoy 80065d5
fix bug in parametric.py
earoy 5c19fb0
fix typo on line 100
earoy c6d8609
fix docstring formatting issues
earoy a43faac
fix docstring formatting issues
earoy 017f59f
Pin matplotlib<3.9
arokem File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
"""Perform linear modeling at leach node along the tract.""" | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import statsmodels.formula.api as smf | ||
|
||
from sklearn.impute import SimpleImputer | ||
from statsmodels.api import OLS | ||
from statsmodels.stats.multitest import multipletests | ||
|
||
|
||
def node_wise_regression( | ||
afq_dataset, | ||
tract, | ||
metric, | ||
formula, | ||
group="group", | ||
lme=False, | ||
rand_eff="subjectID", | ||
): | ||
"""Model group differences using node-wise regression along the length of the tract. | ||
|
||
Returns a list of beta-weights, confidence intervals, p-values, and rejection criteria | ||
based on multiple-comparison correction. | ||
|
||
Based on this example: https://github.com/yeatmanlab/AFQ-Insight/blob/main/examples/plot_als_comparison.py | ||
|
||
Parameters | ||
---------- | ||
afq_dataset: AFQDataset | ||
Loaded AFQDataset object | ||
tract: str | ||
String specifying the tract to model | ||
|
||
metric: str | ||
String specifying which diffusion metric to use as an outcome | ||
eg. 'fa' | ||
|
||
formula: str | ||
An R-style formula <https://www.statsmodels.org/dev/example_formulas.html> | ||
specifying the regression model to fit at each node. This can take the form | ||
of either a linear mixed-effects model or OLS regression | ||
|
||
lme: Bool, default=False | ||
Boolean specifying whether to fit a linear mixed-effects model | ||
|
||
rand_eff: str, default='subjectID' | ||
String specifying the random effect grouping structure for linear mixed-effects | ||
models. If using anything other than the default value, this column must be present | ||
in the 'target_cols' of the AFQDataset object | ||
|
||
|
||
Returns | ||
------- | ||
tract_dict: dict | ||
A dictionary with the following key-value pairs: | ||
|
||
{'tract': tract, | ||
'reference_coefs': coefs_default, | ||
'group_coefs': coefs_treat, | ||
'reference_CI': cis_default, | ||
'group_CI': cis_treat, | ||
'pvals': pvals, | ||
'reject_idx': reject_idx, | ||
'model_fits': fits} | ||
|
||
tract: str | ||
The tract described by this dictionary | ||
|
||
reference_coefs: list of floats | ||
A list of beta-weights representing the average diffusion metric for the | ||
reference group on a diffusion metric at a given location along the tract | ||
|
||
group_coefs: list of floats | ||
A list of beta-weights representing the average group effect metric for the | ||
treatment group on a diffusion metric at a given location along the tract | ||
|
||
reference_CI: np.array of np.array | ||
A numpy array containing a series of numpy arrays indicating the 95% confidence interval | ||
around the estimated beta-weight of the reference category at a given location along the tract | ||
|
||
group_CI: np.array of np.array | ||
A numpy array containing a series of numpy arrays indicating the 95% confidence interval | ||
around the estimated beta-weight of the treatment effect at a given location along the tract | ||
|
||
pvals: list of floats | ||
A list of p-values testing whether or not the beta-weight of the group effect is | ||
different from 0 | ||
|
||
reject_idx: list of Booleans | ||
A list of node indices where the null hypothesis is rejected after multiple-comparison | ||
corrections | ||
|
||
model_fits: list of statsmodels objects | ||
A list of the statsmodels object fit along the length of the nodes | ||
|
||
""" | ||
X = SimpleImputer(strategy="median").fit_transform(afq_dataset.X) | ||
afq_dataset.target_cols[0] = group | ||
|
||
tract_data = ( | ||
pd.DataFrame(columns=afq_dataset.feature_names, data=X) | ||
.filter(like=tract) | ||
.filter(like=metric) | ||
) | ||
|
||
pvals = np.zeros(tract_data.shape[-1]) | ||
coefs_default = np.zeros(tract_data.shape[-1]) | ||
coefs_treat = np.zeros(tract_data.shape[-1]) | ||
cis_default = np.zeros((tract_data.shape[-1], 2)) | ||
cis_treat = np.zeros((tract_data.shape[-1], 2)) | ||
fits = {} | ||
|
||
# Loop through each node and fit a model | ||
for ii, column in enumerate(tract_data.columns): | ||
|
||
# fit linear mixed-effects model | ||
if lme: | ||
|
||
this = pd.DataFrame(afq_dataset.y, columns=afq_dataset.target_cols) | ||
this[metric] = tract_data[column] | ||
|
||
# if no random effect specified, use subjectID as random effect | ||
if rand_eff == "subjectID": | ||
this["subjectID"] = afq_dataset.subjects | ||
|
||
model = smf.mixedlm(formula, this, groups=rand_eff) | ||
fit = model.fit() | ||
fits[column] = fit | ||
|
||
# fit OLS model | ||
else: | ||
|
||
_, _, _ = column | ||
this = pd.DataFrame(afq_dataset.y, columns=afq_dataset.target_cols) | ||
this[metric] = tract_data[column] | ||
|
||
model = OLS.from_formula(formula, this) | ||
fit = model.fit() | ||
fits[column] = fit | ||
|
||
# pull out coefficients, CIs, and p-values from our model | ||
coefs_default[ii] = fit.params.filter(regex="Intercept", axis=0).iloc[0] | ||
coefs_treat[ii] = fit.params.filter(regex=group, axis=0).iloc[0] | ||
|
||
cis_default[ii] = ( | ||
fit.conf_int(alpha=0.05).filter(regex="Intercept", axis=0).values | ||
) | ||
cis_treat[ii] = fit.conf_int(alpha=0.05).filter(regex=group, axis=0).values | ||
pvals[ii] = fit.pvalues.filter(regex=group, axis=0).iloc[0] | ||
|
||
# Correct p-values for multiple comparisons | ||
reject, pval_corrected, _, _ = multipletests(pvals, alpha=0.05, method="fdr_bh") | ||
reject_idx = np.where(reject) | ||
|
||
tract_dict = { | ||
"tract": tract, | ||
"reference_coefs": coefs_default, | ||
"group_coefs": coefs_treat, | ||
"reference_CI": cis_default, | ||
"group_CI": cis_treat, | ||
"pvals": pvals, | ||
"reject_idx": reject_idx, | ||
"model_fits": fits, | ||
} | ||
|
||
return tract_dict |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick Q for now, but no time for a full review yet: Is it possible to tell from the formula whether this is an lme or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning, is this argument redundant with information already provided in the formula?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've some digging through the
statsmodels
documentation and it looks like for mixed-effects models, the LME function takes an R-style formula for the fixed effects (which is identical to the OLS formula) and a groups parameter to specify the random effects, so I don't think we can tell from the formula....One thought though is to have the user pass in their model formula which we can then parse to determine whether it's OLS or MLE and then populate the call to thestatsmodels
LME function with the corresponding random effects structure if neededThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid creating our own "domain specific language" and addressing all the possible corner cases may become pretty hairy, so maybe asking the user to be explicit about it (as you already do here) is best.