-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
642 move the removal of filter qa to imputation #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my own comments!
src/imputation/imputation_helpers.py
Outdated
def tidy_imputation_dataframe( | ||
df: pd.DataFrame, | ||
config: Dict, | ||
logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging.Logger
I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good and the pipeline runs fine.
src/imputation/imputation_helpers.py
Outdated
def tidy_imputation_dataframe( | ||
df: pd.DataFrame, | ||
config: Dict, | ||
logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging.Logger
I think
src/imputation/imputation_helpers.py
Outdated
def tidy_imputation_dataframe( | ||
df: pd.DataFrame, | ||
config: Dict, | ||
logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
# filter estimated_df for records not included in outputs | ||
filtered_output_df = estimated_df.copy().loc[~to_keep] | ||
outputs_df = estimated_df.copy().loc[no_rnd_spenders_filter] | ||
tau_outputs_df = weighted_df.copy().loc[no_rnd_spenders_filter] | ||
|
||
if ni_full_responses is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not part of you ticket Anne, but I would recommend moving this NI processing to another function.
@@ -66,19 +53,10 @@ def form_output_prep( | |||
# outputs_df = pd.concat([outputs_df, ni_full_responses]) | |||
tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line removes the need for a logic gate before hand.
tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses]) if ni_full_responses is not None else tau_outputs_df
Then we won't need the conditional block and makes the code more readable and computationally efficient.
Again, I know this was not part of your ticket but we may as well generate nice to have code-tidying and efficiency tickets while I am looking at the code.
Pull Request submission
Part 1: Remove _prev cols from full_estimation output
At the end of the imputation module, after outputting the imputation QA csv, we drop many of the imputation QA columns, including the _ imputed cols, but we don't drop the _ prev columns and _ link columns used for MoR.
This should be done so that the full_estimation dataframe isn't so large (and doesn't take so long to output).
Part 2: Move the removal of "no_imputation" and "no_mean_found" from outputs
This should now go at the end of imputation
Closes or fixes
Closes #642
Code
Documentation
Any new code includes all the following forms of documentation:
Args
andreturns
for all major functionsData
Testing
Peer Review Section
environment.yaml
Final approval (post-review)
The author has responded to my review and made changes to my satisfaction.
Review comments
Insert detailed comments here!
These might include, but not exclusively:
that it is likely to interact with?)
works correctly?)
Your suggestions should be tailored to the code that you are reviewing.
Be critical and clear, but not mean. Ask questions and set actions.