642 move the removal of filter qa to imputation #180

AnneONS · 2024-01-11T10:14:58Z

Pull Request submission

Part 1: Remove _prev cols from full_estimation output

At the end of the imputation module, after outputting the imputation QA csv, we drop many of the imputation QA columns, including the _ imputed cols, but we don't drop the _ prev columns and _ link columns used for MoR.

This should be done so that the full_estimation dataframe isn't so large (and doesn't take so long to output).

Part 2: Move the removal of "no_imputation" and "no_mean_found" from outputs

This should now go at the end of imputation

Closes or fixes

Detail the ticket(s) you are closing with this PR
Closes #642

Code

Documentation

Any new code includes all the following forms of documentation:

Function Documentation Docstrings within the function(s')/methods have been created
- Includes Args and returns for all major functions
- The docstring details data types
Updated Documentation: User and/or developer working doc has been updated

Data

All data needed to run this script is available in Dev/Test
All data is excluded from this pull request
Secrets checker pre-commit passes

Testing

Unit tests Unit tests have been created and are passing or a new ticket to create tests has been created

Peer Review Section

All requirements install from (updated) environment.yaml
Documentation has been created and is clear - check the working document
Doctrings (Google format) have been created and accurately describe the function's functionality
Unit tests pass, or if not present a new ticket to create tests has been created
Code runs The code runs on reviewer's machine and/or CDSW

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction.

I recommend merging this request.

Review comments

Insert detailed comments here!

These might include, but not exclusively:

bugs that need fixing (does it work as expected? and does it work with other code
that it is likely to interact with?)
alternative methods (could it be written more efficiently or with more clarity?)
documentation improvements (does the documentation reflect how the code actually works?)
additional tests that should be implemented (do the tests effectively assure that it
works correctly?)
code style improvements (could the code be written more clearly?)
Do the changes represent a change in functionality so the version number should increase? Start a discussion if so.
As a review you can generates the same outputs from running the code

Your suggestions should be tailored to the code that you are reviewing.
Be critical and clear, but not mean. Ask questions and set actions.

github-actions · 2024-01-11T10:15:37Z

Detailed Coverage Report

File	Stmts	Miss	Cover	Missing
src
__init__.py	0	0	100%
src/aggregation
__init__.py	0	0	100%
src/construction
__init__.py	0	0	100%
construction.py	48	48	0%	2–4, 6–7, 9, 12, 39–42, 44–46, 49–51, 54–63, 66, 68, 71, 74, 77–78, 81–83, 86–88, 91–92, 95, 103, 111–112, 114, 118, 120
old_construction.py	113	113	0%	3–7, 10, 13, 41–45, 48–50, 53–55, 60, 63, 65, 68, 74–75, 88–89, 92, 101, 103–104, 107, 117, 120, 124, 132, 135–136, 138, 140–141, 144, 149, 156, 164, 167, 171–174, 176–177, 180, 183–187, 192, 199–202, 204, 207, 210–213, 216–217, 220–222, 224–229, 233–239, 244–245, 248–249, 252, 254–255, 257, 259, 263–264, 267, 270, 275, 278–280, 282, 286, 289, 292, 295–297, 301, 304, 306
src/estimation
__init__.py	0	0	100%
apply_weights.py	16	16	0%	2–5, 7, 10, 24–29, 32–34, 36
calculate_weights.py	37	37	0%	1–3, 6, 9, 23–25, 28, 30, 33, 64–66, 69, 72, 75–78, 80–81, 85, 96, 99, 102, 105, 108, 111, 114–115, 121–122, 124, 127, 139–140
cellno_mapper.py	7	0	100%
src/imputation
MoR.py	90	90	0%	2–4, 6–7, 13–14, 17, 35, 38, 40–41, 43, 47–48, 54, 63–66, 70–71, 73–75, 77, 79, 82, 84, 87, 112, 117–120, 122, 125, 131–136, 138, 141, 147–149, 152, 161, 165, 171, 178, 188–190, 194, 197, 206–207, 210, 218, 221, 229, 231, 233, 235–236, 243–244, 249–250, 253, 264, 270, 273, 276, 278–279, 281, 284, 287–289, 291, 294, 296, 300, 303–304
__init__.py	0	0	100%
apportionment.py	36	10	72%	124, 126, 141, 143, 145, 157–159, 162, 164
expansion_imputation.py	39	32	17%	21, 25–26, 28–29, 32, 35, 38–40, 44, 47, 49, 52–53, 55, 58, 61, 81–83, 87, 90, 93, 96, 102, 107, 110, 112, 118, 121, 125
imputation_helpers.py	77	39	49%	25–27, 29–30, 32, 34, 141–142, 144, 178, 180, 182–183, 185, 189, 195, 198, 202, 205–206, 208, 234, 237, 240, 246–248, 251–252, 254–255, 258–259, 262–264, 270, 272
impute_civ_def.py	94	48	48%	134–136, 166, 168–169, 172–173, 176–178, 183–184, 186, 189, 191–194, 196, 198, 203–204, 206, 209, 211–213, 216, 218–219, 221–222, 224–225, 238, 241–245, 247–248, 251–252, 254–255, 257
manual_imputation.py	19	19	0%	1–2, 4, 6, 9–10, 28–29, 33, 35, 37, 44, 59, 61–62, 64–65, 67–68
sf_expansion.py	65	65	0%	2–4, 6–7, 9, 11–12, 15, 24, 26, 29, 32–33, 36–37, 40, 43–44, 47, 49, 56–57, 61, 64, 69–70, 73, 75, 78–79, 81, 83, 86, 90, 92, 96, 104, 107, 111, 113–114, 116, 118, 121, 124, 133, 136, 139, 148, 151, 153, 159, 165, 168–169, 174, 178–179, 183, 186, 194, 196, 200, 202
short_to_long.py	21	21	0%	1, 3–4, 7, 20, 23, 25, 32–33, 35, 37–38, 40–41, 43–45, 47, 49, 53, 55
tmi_imputation.py	194	155	20%	43, 46, 48, 52, 54, 105, 119, 121–122, 124–128, 130–131, 134, 140, 143–144, 147, 150, 152, 157, 162, 167, 180, 183–184, 186, 189–190, 211–213, 215, 217, 220, 223–225, 227–228, 231, 234, 241–244, 246, 270, 273, 275, 278, 282, 286, 307–308, 311, 314–315, 318, 321–322, 324–325, 327, 329, 332, 334, 336, 339, 342–343, 345, 347–349, 351, 358, 360, 367, 369–370, 372–373, 376, 378, 380, 382, 385, 388–389, 392, 394, 396, 401, 407, 413, 419, 423, 444, 447, 450, 452–453, 455–456, 458–459, 462, 464–465, 468, 471, 473–474, 490, 492, 495–497, 499, 501, 503–504, 507, 509–510, 513, 515, 517–518, 538, 542, 545–546, 549–550, 553, 556, 558, 561, 563, 568, 570, 577–578, 581, 584–585, 587, 590, 592–593
src/northern_ireland
__init__.py	0	0	100%
ni_headcount_fte.py	29	29	0%	2–4, 6, 8, 11, 25, 27, 29, 31, 33, 35, 37, 40, 55, 57–59, 61, 63–64, 66–67, 69, 72, 81, 83–84, 86
ni_staging.py	36	36	0%	3–6, 8, 10, 13, 21–22, 25, 27–29, 31, 34, 40, 44, 46, 49, 80–81, 84, 89, 93, 96, 99–107, 109, 111
src/outlier_detection
__init__.py	0	0	100%
auto_outliers.py	83	39	53%	23–24, 26–27, 31–36, 38, 40, 43, 70, 74, 116, 190, 192, 194–195, 197, 201, 234, 237, 240, 242, 245, 249, 252, 255, 257–258, 260, 262, 272–275, 277
manual_outliers.py	16	0	100%
src/outputs
__init__.py	0	0	100%
export_files.py	101	101	0%	5–11, 13–14, 18, 24, 42, 44, 51, 56, 63, 68, 71, 98, 104, 107, 114, 116, 119, 124–125, 128–132, 135–136, 139, 151–153, 155, 158, 169, 171–172, 174, 177, 195, 198, 201–202, 209, 213–214, 217–219, 222, 224, 226–235, 239, 241–250, 255–256, 258, 261–263, 266, 269, 272, 286, 289–290, 298, 301, 303, 305, 315, 317–319, 328, 330, 333–334
form_output_prep.py	20	20	0%	1–3, 6, 32–33, 36–37, 39, 41, 44–49, 54, 56, 60, 62
gb_sas.py	28	28	0%	2–5, 7–10, 12, 15, 38–40, 43, 48, 51, 61, 65, 68, 71, 74, 79, 82–84, 87–89
intram_by_civil_defence.py	26	26	0%	2–6, 8, 11, 14, 34–36, 38–39, 42–44, 46, 49, 54, 57, 60–62, 65–67
intram_by_itl1.py	38	38	0%	2–5, 7–8, 10, 13, 36–38, 41, 44–47, 50–52, 55–56, 59–62, 65–66, 69, 72, 75, 78–82, 87–89
intram_by_pg.py	27	27	0%	2–5, 7, 10, 13, 31–33, 36–38, 40, 43–45, 48, 51, 54, 57–60, 65–67
intram_by_sic.py	36	36	0%	2–5, 7, 10, 13, 32–34, 36, 39–40, 43–45, 47, 50–52, 57, 73–77, 79, 82, 85, 88, 91, 94–95, 98–100
long_form.py	22	22	0%	2–5, 7–9, 11, 14, 33–35, 38, 41, 44, 47, 50–52, 54–56
manifest_output.py	78	78	0%	1–4, 8, 11–12, 15, 33, 48–51, 54–55, 59–60, 65–66, 68, 71–75, 78–84, 86, 104–105, 112, 114–115, 122, 125, 127, 129, 131, 135, 145, 150–151, 157, 160–161, 163–164, 172–175, 182, 189, 191, 196, 198–200, 202–203, 205–206, 208–211, 213, 216, 218, 224–225, 228–229
map_output_cols.py	58	50	13%	21–22, 24, 27–28, 30–31, 34, 52–53, 56, 59, 62, 65, 67, 69, 71–72, 89, 99, 102, 107, 110–111, 114, 116, 131–132, 134, 137, 139, 158–159, 162, 165, 168, 171, 173, 175, 177, 179–180, 199, 201, 203–204, 207–208, 211–212
ni_sas.py	25	25	0%	2–9, 11, 14, 38–40, 45, 55, 57, 60, 63, 68, 71–73, 76–78
outputs_helpers.py	23	10	56%	46, 53–55, 79–81, 84, 87, 89
short_form.py	40	19	52%	78, 85, 87, 110–112, 115, 118, 121, 124, 127, 130, 133, 136–138, 140–142
status_filtered.py	12	6	50%	29–31, 33–35
tau.py	30	30	0%	2–9, 11, 14, 37–39, 42, 45, 50, 53, 63, 66, 69, 72, 75, 78, 83, 86–88, 91–93
total_fte.py	14	14	0%	2–5, 8, 11, 24–25, 27–28, 34, 39–41
src/site_apportionment
__init__.py	0	0	100%
site_apportionment.py	69	69	0%	1–4, 6, 8, 11–19, 22–23, 26, 39, 42–49, 52, 71–73, 76, 79, 84, 87, 90, 92, 95, 122, 125, 128, 132, 135, 138, 141–142, 145–146, 149–151, 154, 157, 160–161, 164, 167, 170–173, 178, 181, 184–185, 188, 191, 194, 198
src/staging
__init__.py	0	0	100%
history_loader.py	32	2	93%	42, 54
pg_conversion.py	53	31	41%	33, 35–37, 40, 42–45, 47–48, 53–55, 57, 62, 64, 100, 108, 111, 151, 153–154, 157, 160, 163, 166, 169–171, 173
spp_parser.py	14	0	100%
spp_snapshot_processing.py	34	0	100%
staging_helpers.py	144	110	23%	51–53, 55–57, 59, 69, 72, 97–103, 120, 123, 132–134, 137, 140–143, 145, 147, 166–167, 169–170, 172, 211, 214, 217, 220, 223, 226–227, 230, 233, 235, 237, 240, 243, 258–260, 262–264, 266–268, 272–273, 277, 279, 282, 284, 306–307, 311, 316–318, 340, 343, 345, 348, 353–355, 358–359, 362, 364, 369, 375, 398–399, 402, 407–408, 411, 414, 418, 425, 456–459, 461–462, 465–466, 502, 505–508, 511, 514, 517–520, 523, 525
validation.py	223	66	70%	17–18, 73, 206, 208, 308–309, 336, 339, 386–387, 398, 422, 436–437, 443, 447, 455–456, 462–463, 502–503, 510, 512, 515, 517, 538, 540–541, 544–545, 548–550, 552, 554, 557–558, 560, 562–563, 634, 636–637, 640–641, 644–645, 648, 651–652, 655, 657, 659–660, 673, 675–681, 684, 687
src/utils
__init__.py	0	0	100%
helpers.py	17	5	70%	14–15, 19–20, 22
local_file_mods.py	105	44	58%	33–38, 83, 133–135, 183–184, 195–199, 210–211, 222, 233, 244–245, 247, 258–259, 270–271, 280, 288, 299, 301–302, 304–305, 309–311, 315, 329–330, 333–334, 336
TOTAL	2289	1654	27%

Summary of tests

Tests	Skipped	Failures	Errors	Time
50	0 💤	0 ❌	0 🔥	1.218s ⏱️

src/imputation/imputation_helpers.py

AnneONS

my own comments!

src/imputation/imputation_helpers.py

AnneONS · 2024-01-11T15:08:34Z

src/imputation/imputation_helpers.py

+def tidy_imputation_dataframe(
+        df: pd.DataFrame,
+        config: Dict,
+        logger,


what object?

logging.Logger I think

jwestw

This all looks good and the pipeline runs fine.

jwestw · 2024-01-11T15:33:15Z

src/imputation/imputation_helpers.py

+def tidy_imputation_dataframe(
+        df: pd.DataFrame,
+        config: Dict,
+        logger,


logging.Logger I think

src/imputation/imputation_helpers.py

jwestw · 2024-01-11T15:38:40Z

src/imputation/imputation_helpers.py

+def tidy_imputation_dataframe(
+        df: pd.DataFrame,
+        config: Dict,
+        logger,


jwestw · 2024-01-11T16:09:34Z

src/outputs/form_output_prep.py

-    # filter estimated_df for records not included in outputs
-    filtered_output_df = estimated_df.copy().loc[~to_keep]
+    outputs_df = estimated_df.copy().loc[no_rnd_spenders_filter]
+    tau_outputs_df = weighted_df.copy().loc[no_rnd_spenders_filter]

    if ni_full_responses is not None:


I know this is not part of you ticket Anne, but I would recommend moving this NI processing to another function.

jwestw · 2024-01-11T16:11:32Z

src/outputs/form_output_prep.py

@@ -66,19 +53,10 @@ def form_output_prep(
        # outputs_df = pd.concat([outputs_df, ni_full_responses])
        tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses])


This line removes the need for a logic gate before hand.

tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses]) if ni_full_responses is not None else tau_outputs_df

Then we won't need the conditional block and makes the code more readable and computationally efficient.

Again, I know this was not part of your ticket but we may as well generate nice to have code-tidying and efficiency tickets while I am looking at the code.

src/outlier_detection/auto_outliers.py

move the removal of filter qa to imputation

c215a84

AnneONS added 2 commits January 11, 2024 10:36

updated the filtered_qa_schema toml

2a42dcb

correct the cols to be dropped at the end of imputation

638e5c1

AnneONS commented Jan 11, 2024

View reviewed changes

src/imputation/imputation_helpers.py Outdated Show resolved Hide resolved

AnneONS commented Jan 11, 2024

View reviewed changes

Changes requested during joint review

799802f

jwestw approved these changes Jan 11, 2024

View reviewed changes

AnneONS merged commit 280c4dc into develop Jan 15, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

642 move the removal of filter qa to imputation #180

642 move the removal of filter qa to imputation #180

AnneONS commented Jan 11, 2024 •

edited

Loading

github-actions bot commented Jan 11, 2024 •

edited

Loading

AnneONS left a comment

AnneONS Jan 11, 2024

jwestw Jan 11, 2024

jwestw Jan 11, 2024

jwestw left a comment

jwestw Jan 11, 2024

jwestw Jan 11, 2024

jwestw Jan 11, 2024

jwestw Jan 11, 2024

		@@ -66,19 +53,10 @@ def form_output_prep(
		# outputs_df = pd.concat([outputs_df, ni_full_responses])
		tau_outputs_df = pd.concat([tau_outputs_df, ni_full_responses])

642 move the removal of filter qa to imputation #180

642 move the removal of filter qa to imputation #180

Conversation

AnneONS commented Jan 11, 2024 • edited Loading

Pull Request submission

Closes or fixes

Code

Documentation

Data

Testing

Peer Review Section

Final approval (post-review)

Review comments

github-actions bot commented Jan 11, 2024 • edited Loading

Summary of tests

AnneONS left a comment

Choose a reason for hiding this comment

AnneONS Jan 11, 2024

Choose a reason for hiding this comment

jwestw Jan 11, 2024

Choose a reason for hiding this comment

jwestw Jan 11, 2024

Choose a reason for hiding this comment

jwestw left a comment

Choose a reason for hiding this comment

jwestw Jan 11, 2024

Choose a reason for hiding this comment

jwestw Jan 11, 2024

Choose a reason for hiding this comment

jwestw Jan 11, 2024

Choose a reason for hiding this comment

jwestw Jan 11, 2024

Choose a reason for hiding this comment

AnneONS commented Jan 11, 2024 •

edited

Loading

github-actions bot commented Jan 11, 2024 •

edited

Loading