Deprecate `get_most_severe_consequence_for_summary` in favor of more flexible `get_most_severe_csq_from_multiple_csq_lists` #714

jkgoodrich · 2024-06-25T19:20:18Z

Also adds:

filter_to_most_severe_consequences, which is used by get_most_severe_csq_from_multiple_csq_lists
Adds loftee_labels and no_lof_flags parameters to filter_vep_transcript_csqs_expr for filtering by loftee labels and flags.

Depends on #713

updates_to_get_most_severe_consequence_for_summary.html.zip testing to make sure the same results are returned for get_most_severe_consequence_for_summary

…s correctly

- Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty - Pass `csq_order` to `add_most_severe_consequence_to_consequence`

…ble to use in `default_generate_gene_lof_matrix`

…_summary`

… into jg/fix_process_consequences

…nsequence_to_consequence more flexible

…sq_lists and make use of it in process_consequences

… into jg/fix_process_consequences # Conflicts: # gnomad/utils/vep.py

…s` to `filter_to_most_severe_consequences` and clean them up

…f `get_most_severe_consequence_for_summary`

…ttps://github.com/broadinstitute/gnomad_methods into jg/deprecate_get_most_severe_consequence_for_summary

… if it's empty after filtering

tests/utils/test_vep.py

gnomad/utils/vep.py

klaricch · 2024-09-30T15:40:13Z

gnomad/utils/vep.py

+
+        - most_severe_consequence: Most severe consequence for variant.
+        - lof: Whether the variant is a loss-of-function variant.
+        - no_lof_flags: Whether the variant has any LOFTEE flags (True if no flags).


this returns either True or None right? why not True or False?

I made some changes, so I'm not sure what the result was before, but now in my test on 2 partitions of the exomes result HT, I get {False: 21, True: 580, None: 78785}. Let me know if you are still seeing no False

klaricch · 2024-09-30T17:03:46Z

tests/utils/test_vep.py

+    )
+
+    # Test the function
+    result = get_most_severe_csq_from_multiple_csq_lists(vep_expr)


also tested with setting prioritize_loftee_no_flags to True or False -- does it make sense that got the same result regardless of how set this param was set?

With the current vep_expr, yes. I added some changes to it, and a test for it

klaricch · 2024-09-30T17:04:14Z

gnomad/utils/vep.py

    )

+    # Build the case expression to determine the most severe consequence.
+    ms_csq_expr = hl.case(missing_false=True)


why use case expressions?

Is there a reason not to? If you have a clearer solution, I'm happy to make modifications, just let me know what your thinking

klaricch · 2024-09-30T17:17:58Z

gnomad/utils/vep.py

+        )
+
+    # Initialize the lof struct with missing values.
+    lof_expr = hl.struct(lof=hl.missing(hl.tstr), no_lof_flags=hl.missing(hl.tbool))


so lof and no_lof_flags will always be None if prioritize_loftee/prioritize_loftee_no_flags are False, even if lof flags are present?

I modified this a bit, let me know if the changes make sense

… into jg/deprecate_get_most_severe_consequence_for_summary

…e of the tests

klaricch · 2024-11-13T17:41:47Z

tests/utils/test_vep.py

+    @pytest.mark.parametrize(
+        "prioritize_protein_coding, prioritize_loftee, prioritize_loftee_no_flags, additional_order_field, additional_order, expected_most_severe_csq, expected_polyphen_prediction",
+        [
+            (False, False, False, None, None, None, None),


why only set expected_most_severe_csq if it's "stop_gained"?

klaricch · 2024-11-13T17:42:37Z

tests/utils/test_vep.py

+                    else polyphen_prediction
+                ),
+            )
+


Suggested change

# Define csq, protein_coding, lof, no_lof_flags, and polyphen_prediction.

klaricch · 2024-11-13T17:51:46Z

tests/utils/test_vep.py

+            additional_order=additional_order,
+        )
+
+        expected_dict = hl.Struct(


how were the values for the expected dict decided?

klaricch · 2024-11-13T18:33:26Z

tests/utils/test_vep.py

+            + (["lof"] if prioritize_loftee else [])
+            + (
+                ["no_lof_flags"]
+                if prioritize_loftee_no_flags or prioritize_loftee


i find it confusing when no_lof_flags is True in cases where prioritize_loftee is True and prioritize_loftee_no_flags is False and there are flags present

klaricch · 2024-11-18T13:51:07Z

gnomad/utils/vep.py

@@ -777,86 +780,96 @@ def filter_to_most_severe_consequences(
    :return: ArrayExpression with of the consequences that match the most severe
        consequence.
    """
+    # Get the dtype of the csq_expr ArrayExpression elements


Suggested change

# Get the dtype of the csq_expr ArrayExpression elements

# Get the dtype of the csq_expr ArrayExpression elements.

klaricch · 2024-11-18T16:12:19Z

tests/utils/test_vep.py

+            (True, True, False, *polyphen_params, None, "possibly_damaging"),
+            (True, False, True, *polyphen_params, None, "possibly_damaging"),
+            (False, True, True, *polyphen_params, "stop_gained", None),
+            # Need to figure out class too large error


i also got this error

jkgoodrich added 21 commits May 7, 2024 21:25

Clean-up process_consequences and fix to handle tc.lof missingnes…

c0682c5

…s correctly

Adds the following fixes to process_consequences:

6d9a0ea

- Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty - Pass `csq_order` to `add_most_severe_consequence_to_consequence`

Updates to get_most_severe_consequence_for_summary to make it possi…

6688282

…ble to use in `default_generate_gene_lof_matrix`

Modify process_consequences to use `get_most_severe_consequence_for…

d1c2f8a

…_summary`

Merge branch 'main' of https://github.com/broadinstitute/gnomad_methods…

0e856a3

… into jg/fix_process_consequences

deprecate get_most_severe_consequence_for_summary

be6c4b3

Pull out prioritize_loftee_hc_no_flags from process_consequences

e40a74d

Make add_most_severe_csq_to_tc_within_vep_root and add_most_severe_co…

2bf94f1

…nsequence_to_consequence more flexible

Clean-up vep consequence functions

f045046

Remove duplicate version of prioritize_loftee_hc_no_flags

7939af8

Add prioritize_loftee_no_flags to get_most_severe_csq_from_multiple_c…

c612203

…sq_lists and make use of it in process_consequences

Add tests for vep functions used in this PR

059b012

Merge branch 'main' of https://github.com/broadinstitute/gnomad_methods…

e899b51

… into jg/fix_process_consequences # Conflicts: # gnomad/utils/vep.py

Move some of the parts in `get_most_severe_csq_from_multiple_csq_list…

aa67cc3

…s` to `filter_to_most_severe_consequences` and clean them up

Small cleanup

d6f8d21

Fix the use of keep

cce80d1

Change to use get_most_severe_csq_from_multiple_csq_lists instead o…

9de076a

…f `get_most_severe_consequence_for_summary`

Merge branch 'jg/make_expr_version_of_filter_vep_transcript_csqs' of h…

eb4c97e

…ttps://github.com/broadinstitute/gnomad_methods into jg/deprecate_get_most_severe_consequence_for_summary

use filter_vep_transcript_csqs_expr for loftee filter

2b9aaed

Merge branch 'jg/make_expr_version_of_filter_vep_transcript_csqs' of h…

527615d

…ttps://github.com/broadinstitute/gnomad_methods into jg/deprecate_get_most_severe_consequence_for_summary

Use filter_vep_transcript_csqs_expr for protein coding filter

fdc41d6

jkgoodrich added the Changelog: new feature label Jun 25, 2024

jkgoodrich requested a review from klaricch June 25, 2024 19:20

jkgoodrich assigned klaricch and jkgoodrich Jun 25, 2024

jkgoodrich added 5 commits June 25, 2024 13:29

Remove process_consequence changes

6c4e340

Move vep tests into utils directory

7d7ff81

Fix tests

e1b542a

formatting of tests

b458199

Move POLYPHEN ORDER to a different PR

aca10f5

jkgoodrich added 3 commits June 25, 2024 15:16

Add extra newline

0688308

Remove unneeded f-string

9cab300

Change filter_vep_transcript_csqs_expr to set csq_expr as missing…

e4d5cef

… if it's empty after filtering

Base automatically changed from jg/make_expr_version_of_filter_vep_transcript_csqs to main July 9, 2024 15:54

klaricch requested changes Sep 30, 2024

View reviewed changes

jkgoodrich added 9 commits October 18, 2024 11:32

Merge branch 'main' of https://github.com/broadinstitute/gnomad_methods…

a46d11e

… into jg/deprecate_get_most_severe_consequence_for_summary

Address reviewer comments

a67df82

format

aa9df9a

Small docstring change

5ecdeac

Add check for is_tc

e1c9ef7

Change docstring default to correct value

fbda89b

Change to get around pylint error

8ab6fb4

Clean up test_vep.py to add additional tests and improve the structur…

141fecf

…e of the tests

Fix incorrect return types

8e25621

jkgoodrich requested a review from klaricch October 24, 2024 13:19

klaricch requested changes Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate `get_most_severe_consequence_for_summary` in favor of more flexible `get_most_severe_csq_from_multiple_csq_lists` #714

Deprecate `get_most_severe_consequence_for_summary` in favor of more flexible `get_most_severe_csq_from_multiple_csq_lists` #714

jkgoodrich commented Jun 25, 2024

klaricch Sep 30, 2024

jkgoodrich Oct 22, 2024

klaricch Sep 30, 2024

jkgoodrich Oct 22, 2024

klaricch Sep 30, 2024

jkgoodrich Oct 22, 2024

klaricch Sep 30, 2024

jkgoodrich Oct 22, 2024

klaricch Nov 13, 2024

klaricch Nov 13, 2024

klaricch Nov 13, 2024

klaricch Nov 13, 2024

klaricch Nov 18, 2024

klaricch Nov 18, 2024


	# Define csq, protein_coding, lof, no_lof_flags, and polyphen_prediction.

	# Get the dtype of the csq_expr ArrayExpression elements
	# Get the dtype of the csq_expr ArrayExpression elements.

Deprecate get_most_severe_consequence_for_summary in favor of more flexible get_most_severe_csq_from_multiple_csq_lists #714

Are you sure you want to change the base?

Deprecate get_most_severe_consequence_for_summary in favor of more flexible get_most_severe_csq_from_multiple_csq_lists #714

Conversation

jkgoodrich commented Jun 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Deprecate `get_most_severe_consequence_for_summary` in favor of more flexible `get_most_severe_csq_from_multiple_csq_lists` #714

Deprecate `get_most_severe_consequence_for_summary` in favor of more flexible `get_most_severe_csq_from_multiple_csq_lists` #714