Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

642 move the removal of filter qa to imputation #180

Merged
merged 4 commits into from
Jan 15, 2024

Conversation

AnneONS
Copy link
Collaborator

@AnneONS AnneONS commented Jan 11, 2024

Pull Request submission

Part 1: Remove _prev cols from full_estimation output

At the end of the imputation module, after outputting the imputation QA csv, we drop many of the imputation QA columns, including the _ imputed cols, but we don't drop the _ prev columns and _ link columns used for MoR.

This should be done so that the full_estimation dataframe isn't so large (and doesn't take so long to output).

Part 2: Move the removal of "no_imputation" and "no_mean_found" from outputs

This should now go at the end of imputation

Closes or fixes

  • Detail the ticket(s) you are closing with this PR
    Closes #642

Code

  • Code runs The code runs on my machine and/or CDSW
  • Conflicts resolved There are no conflicts (I have performed a rebase if necessary)
  • Requirements My/our code functions according to the requirements of the ticket
  • Dependencies I have updated the environment yaml so it includes any new libraries I have used
  • Configuration file updated any high level parameters that the user may interact with have been put into the config file (and imported to the script)
  • Clean Code
    • Code is as PEP 8 compliant as I can humanly make it
    • Code passess flake8 linting check
    • Code adheres to DRY
  • Type hints All new functions have type hints

Documentation

Any new code includes all the following forms of documentation:

  • Function Documentation Docstrings within the function(s')/methods have been created
    • Includes Args and returns for all major functions
    • The docstring details data types
  • Updated Documentation: User and/or developer working doc has been updated

Data

  • All data needed to run this script is available in Dev/Test
  • All data is excluded from this pull request
  • Secrets checker pre-commit passes

Testing

  • Unit tests Unit tests have been created and are passing or a new ticket to create tests has been created

Peer Review Section

  • All requirements install from (updated) environment.yaml
  • Documentation has been created and is clear - check the working document
  • Doctrings (Google format) have been created and accurately describe the function's functionality
  • Unit tests pass, or if not present a new ticket to create tests has been created
  • Code runs The code runs on reviewer's machine and/or CDSW

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction.

  • I recommend merging this request.

Review comments

Insert detailed comments here!

These might include, but not exclusively:

  • bugs that need fixing (does it work as expected? and does it work with other code
    that it is likely to interact with?)
  • alternative methods (could it be written more efficiently or with more clarity?)
  • documentation improvements (does the documentation reflect how the code actually works?)
  • additional tests that should be implemented (do the tests effectively assure that it
    works correctly?)
  • code style improvements (could the code be written more clearly?)
  • Do the changes represent a change in functionality so the version number should increase? Start a discussion if so.
  • As a review you can generates the same outputs from running the code

Your suggestions should be tailored to the code that you are reviewing.
Be critical and clear, but not mean. Ask questions and set actions.

Copy link

github-actions bot commented Jan 11, 2024

Percentage Coverage for this PR

Detailed Coverage Report
FileStmtsMissCoverMissing
src
   __init__.py00100% 
src/aggregation
   __init__.py00100% 
src/construction
   __init__.py00100% 
   construction.py48480%2–4, 6–7, 9, 12, 39–42, 44–46, 49–51, 54–63, 66, 68, 71, 74, 77–78, 81–83, 86–88, 91–92, 95, 103, 111–112, 114, 118, 120
   old_construction.py1131130%3–7, 10, 13, 41–45, 48–50, 53–55, 60, 63, 65, 68, 74–75, 88–89, 92, 101, 103–104, 107, 117, 120, 124, 132, 135–136, 138, 140–141, 144, 149, 156, 164, 167, 171–174, 176–177, 180, 183–187, 192, 199–202, 204, 207, 210–213, 216–217, 220–222, 224–229, 233–239, 244–245, 248–249, 252, 254–255, 257, 259, 263–264, 267, 270, 275, 278–280, 282, 286, 289, 292, 295–297, 301, 304, 306
src/estimation
   __init__.py00100% 
   apply_weights.py16160%2–5, 7, 10, 24–29, 32–34, 36
   calculate_weights.py37370%1–3, 6, 9, 23–25, 28, 30, 33, 64–66, 69, 72, 75–78, 80–81, 85, 96, 99, 102, 105, 108, 111, 114–115, 121–122, 124, 127, 139–140
   cellno_mapper.py70100% 
src/imputation
   MoR.py90900%2–4, 6–7, 13–14, 17, 35, 38, 40–41, 43, 47–48, 54, 63–66, 70–71, 73–75, 77, 79, 82, 84, 87, 112, 117–120, 122, 125, 131–136, 138, 141, 147–149, 152, 161, 165, 171, 178, 188–190, 194, 197, 206–207, 210, 218, 221, 229, 231, 233, 235–236, 243–244, 249–250, 253, 264, 270, 273, 276, 278–279, 281, 284, 287–289, 291, 294, 296, 300, 303–304
   __init__.py00100% 
   apportionment.py361072%124, 126, 141, 143, 145, 157–159, 162, 164
   expansion_imputation.py393217%21, 25–26, 28–29, 32, 35, 38–40, 44, 47, 49, 52–53, 55, 58, 61, 81–83, 87, 90, 93, 96, 102, 107, 110, 112, 118, 121, 125
   imputation_helpers.py773949%25–27, 29–30, 32, 34, 141–142, 144, 178, 180, 182–183, 185, 189, 195, 198, 202, 205–206, 208, 234, 237, 240, 246–248, 251–252, 254–255, 258–259, 262–264, 270, 272
   impute_civ_def.py944848%134–136, 166, 168–169, 172–173, 176–178, 183–184, 186, 189, 191–194, 196, 198, 203–204, 206, 209, 211–213, 216, 218–219, 221–222, 224–225, 238, 241–245, 247–248, 251–252, 254–255, 257
   manual_imputation.py19190%1–2, 4, 6, 9–10, 28–29, 33, 35, 37, 44, 59, 61–62, 64–65, 67–68
   sf_expansion.py65650%2–4, 6–7, 9, 11–12, 15, 24, 26, 29, 32–33, 36–37, 40, 43–44, 47, 49, 56–57, 61, 64, 69–70, 73, 75, 78–79, 81, 83, 86, 90, 92, 96, 104, 107, 111, 113–114, 116, 118, 121, 124, 133, 136, 139, 148, 151, 153, 159, 165, 168–169, 174, 178–179, 183, 186, 194, 196, 200, 202
   short_to_long.py21210%1, 3–4, 7, 20, 23, 25, 32–33, 35, 37–38, 40–41, 43–45, 47, 49, 53, 55
   tmi_imputation.py19415520%43, 46, 48, 52, 54, 105, 119, 121–122, 124–128, 130–131, 134, 140, 143–144, 147, 150, 152, 157, 162, 167, 180, 183–184, 186, 189–190, 211–213, 215, 217, 220, 223–225, 227–228, 231, 234, 241–244, 246, 270, 273, 275, 278, 282, 286, 307–308, 311, 314–315, 318, 321–322, 324–325, 327, 329, 332, 334, 336, 339, 342–343, 345, 347–349, 351, 358, 360, 367, 369–370, 372–373, 376, 378, 380, 382, 385, 388–389, 392, 394, 396, 401, 407, 413, 419, 423, 444, 447, 450, 452–453, 455–456, 458–459, 462, 464–465, 468, 471, 473–474, 490, 492, 495–497, 499, 501, 503–504, 507, 509–510, 513, 515, 517–518, 538, 542, 545–546, 549–550, 553, 556, 558, 561, 563, 568, 570, 577–578, 581, 584–585, 587, 590, 592–593
src/northern_ireland
   __init__.py00100% 
   ni_headcount_fte.py29290%2–4, 6, 8, 11, 25, 27, 29, 31, 33, 35, 37, 40, 55, 57–59, 61, 63–64, 66–67, 69, 72, 81, 83–84, 86
   ni_staging.py36360%3–6, 8, 10, 13, 21–22, 25, 27–29, 31, 34, 40, 44, 46, 49, 80–81, 84, 89, 93, 96, 99–107, 109, 111
src/outlier_detection
   __init__.py00100% 
   auto_outliers.py833953%23–24, 26–27, 31–36, 38, 40, 43, 70, 74, 116, 190, 192, 194–195, 197, 201, 234, 237, 240, 242, 245, 249, 252, 255, 257–258, 260, 262, 272–275, 277
   manual_outliers.py160100% 
src/outputs
   __init__.py00100% 
   export_files.py1011010%5–11, 13–14, 18, 24, 42, 44, 51, 56, 63, 68, 71, 98, 104, 107, 114, 116, 119, 124–125, 128–132, 135–136, 139, 151–153, 155, 158, 169, 171–172, 174, 177, 195, 198, 201–202, 209, 213–214, 217–219, 222, 224, 226–235, 239, 241–250, 255–256, 258, 261–263, 266, 269, 272, 286, 289–290, 298, 301, 303, 305, 315, 317–319, 328, 330, 333–334
   form_output_prep.py20200%1–3, 6, 32–33, 36–37, 39, 41, 44–49, 54, 56, 60, 62
   gb_sas.py28280%2–5, 7–10, 12, 15, 38–40, 43, 48, 51, 61, 65, 68, 71, 74, 79, 82–84, 87–89
   intram_by_civil_defence.py26260%2–6, 8, 11, 14, 34–36, 38–39, 42–44, 46, 49, 54, 57, 60–62, 65–67
   intram_by_itl1.py38380%2–5, 7–8, 10, 13, 36–38, 41, 44–47, 50–52, 55–56, 59–62, 65–66, 69, 72, 75, 78–82, 87–89
   intram_by_pg.py27270%2–5, 7, 10, 13, 31–33, 36–38, 40, 43–45, 48, 51, 54, 57–60, 65–67
   intram_by_sic.py36360%2–5, 7, 10, 13, 32–34, 36, 39–40, 43–45, 47, 50–52, 57, 73–77, 79, 82, 85, 88, 91, 94–95, 98–100
   long_form.py22220%2–5, 7–9, 11, 14, 33–35, 38, 41, 44, 47, 50–52, 54–56
   manifest_output.py78780%1–4, 8, 11–12, 15, 33, 48–51, 54–55, 59–60, 65–66, 68, 71–75, 78–84, 86, 104–105, 112, 114–115, 122, 125, 127, 129, 131, 135, 145, 150–151, 157, 160–161, 163–164, 172–175, 182, 189, 191, 196, 198–200, 202–203, 205–206, 208–211, 213, 216, 218, 224–225, 228–229
   map_output_cols.py585013%21–22, 24, 27–28, 30–31, 34, 52–53, 56, 59, 62, 65, 67, 69, 71–72, 89, 99, 102, 107, 110–111, 114, 116, 131–132, 134, 137, 139, 158–159, 162, 165, 168, 171, 173, 175, 177, 179–180, 199, 201, 203–204, 207–208, 211–212
   ni_sas.py25250%2–9, 11, 14, 38–40, 45, 55, 57, 60, 63, 68, 71–73, 76–78
   outputs_helpers.py231056%46, 53–55, 79–81, 84, 87, 89
   short_form.py401952%78, 85, 87, 110–112, 115, 118, 121, 124, 127, 130, 133, 136–138, 140–142
   status_filtered.py12650%29–31, 33–35
   tau.py30300%2–9, 11, 14, 37–39, 42, 45, 50, 53, 63, 66, 69, 72, 75, 78, 83, 86–88, 91–93
   total_fte.py14140%2–5, 8, 11, 24–25, 27–28, 34, 39–41
src/site_apportionment
   __init__.py00100% 
   site_apportionment.py69690%1–4, 6, 8, 11–19, 22–23, 26, 39, 42–49, 52, 71–73, 76, 79, 84, 87, 90, 92, 95, 122, 125, 128, 132, 135, 138, 141–142, 145–146, 149–151, 154, 157, 160–161, 164, 167, 170–173, 178, 181, 184–185, 188, 191, 194, 198
src/staging
   __init__.py00100% 
   history_loader.py32293%42, 54
   pg_conversion.py533141%33, 35–37, 40, 42–45, 47–48, 53–55, 57, 62, 64, 100, 108, 111, 151, 153–154, 157, 160, 163, 166, 169–171, 173
   spp_parser.py140100% 
   spp_snapshot_processing.py340100% 
   staging_helpers.py14411023%51–53, 55–57, 59, 69, 72, 97–103, 120, 123, 132–134, 137, 140–143, 145, 147, 166–167, 169–170, 172, 211, 214, 217, 220, 223, 226–227, 230, 233, 235, 237, 240, 243, 258–260, 262–264, 266–268, 272–273, 277, 279, 282, 284, 306–307, 311, 316–318, 340, 343, 345, 348, 353–355, 358–359, 362, 364, 369, 375, 398–399, 402, 407–408, 411, 414, 418, 425, 456–459, 461–462, 465–466, 502, 505–508, 511, 514, 517–520, 523, 525
   validation.py2236670%17–18, 73, 206, 208, 308–309, 336, 339, 386–387, 398, 422, 436–437, 443, 447, 455–456, 462–463, 502–503, 510, 512, 515, 517, 538, 540–541, 544–545, 548–550, 552, 554, 557–558, 560, 562–563, 634, 636–637, 640–641, 644–645, 648, 651–652, 655, 657, 659–660, 673, 675–681, 684, 687
src/utils
   __init__.py00100% 
   helpers.py17570%14–15, 19–20, 22
   local_file_mods.py1054458%33–38, 83, 133–135, 183–184, 195–199, 210–211, 222, 233, 244–245, 247, 258–259, 270–271, 280, 288, 299, 301–302, 304–305, 309–311, 315, 329–330, 333–334, 336
TOTAL2289165427% 

Summary of tests

Tests Skipped Failures Errors Time
50 0 💤 0 ❌ 0 🔥 1.218s ⏱️

Copy link
Collaborator Author

@AnneONS AnneONS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my own comments!

src/imputation/imputation_helpers.py Outdated Show resolved Hide resolved
def tidy_imputation_dataframe(
df: pd.DataFrame,
config: Dict,
logger,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.Logger I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Contributor

@jwestw jwestw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good and the pipeline runs fine.

def tidy_imputation_dataframe(
df: pd.DataFrame,
config: Dict,
logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.Logger I think

src/imputation/imputation_helpers.py Outdated Show resolved Hide resolved
def tidy_imputation_dataframe(
df: pd.DataFrame,
config: Dict,
logger,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

# filter estimated_df for records not included in outputs
filtered_output_df = estimated_df.copy().loc[~to_keep]
outputs_df = estimated_df.copy().loc[no_rnd_spenders_filter]
tau_outputs_df = weighted_df.copy().loc[no_rnd_spenders_filter]

if ni_full_responses is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not part of you ticket Anne, but I would recommend moving this NI processing to another function.

@@ -66,19 +53,10 @@ def form_output_prep(
# outputs_df = pd.concat([outputs_df, ni_full_responses])
tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line removes the need for a logic gate before hand.

tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses]) if ni_full_responses is not None else tau_outputs_df

Then we won't need the conditional block and makes the code more readable and computationally efficient.

Again, I know this was not part of your ticket but we may as well generate nice to have code-tidying and efficiency tickets while I am looking at the code.

src/outlier_detection/auto_outliers.py Show resolved Hide resolved
@AnneONS AnneONS merged commit 280c4dc into develop Jan 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants