Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve unreachable code analysis #302

Merged
merged 11 commits into from
Nov 24, 2024
Merged

Conversation

kreathon
Copy link
Contributor

Motivation

def f():
    try:
        return
    except:
        raise Exception()
    print("Unreachable")

At the moment, the print statement is not identified as unreachable.

Implementation

I added a data structure (no_fall_through_nodes) that stores ast nodes that do not allow a fall through. During the ast traversal continue, break, raise and return are added into the this data structure. For every control flow statement / object (for, try, while, with, but also module and functiondef), we check if any of the statement in the body is in no_fall_through_nodes. If that is the case, we report the next statement (if there is any) and add the current node also into no_fall_through_nodes. To make this work generic_visit (means visiting children) is now executed before visiting the current node.

The algorithm was added into the existing Vulture object with the generic_visit (which handles recursion). I think a cleaner implementation could implement a separate node visitor (that handles its own recursion), but I was not sure if the project is open for that.

The old error reporting message is reused (e.g. unreachable code after 'try'). I am not sure if this is the best message or if all unreachable messages should be simplified to something like unreachable code.

Limitations

  • does not support match statements
  • does not support with statements (context manager can suppress exceptions)

Related Issue

This PR is addressing the issue #270.

Checklist:

  • I have updated the documentation in the README.md file or my changes don't require an update.
  • I have added an entry in CHANGELOG.md.
  • I have added or adapted tests to cover my changes.
  • I have run tox -e fix-style to format my code and checked the result with tox -e style.

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2023

Codecov Report

Merging #302 (a59eac4) into main (fc23f33) will increase coverage by 0.06%.
Report is 13 commits behind head on main.
The diff coverage is 100.00%.

❗ Current head a59eac4 differs from pull request most recent head 8d9fcf4. Consider uploading reports for the commit 8d9fcf4 to get more accurate results

@@            Coverage Diff             @@
##             main     #302      +/-   ##
==========================================
+ Coverage   98.96%   99.03%   +0.06%     
==========================================
  Files          21       21              
  Lines         679      727      +48     
==========================================
+ Hits          672      720      +48     
  Misses          7        7              
Files Changed Coverage Δ
vulture/core.py 98.75% <100.00%> (+0.21%) ⬆️
vulture/whitelists/ast_whitelist.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Owner

@jendrikseipp jendrikseipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this PR! I like the algorithm and that it nicely generalizes the earlier solution. However, I'd like to propose a different code organization, so that the core module doesn't grow too large:

  • Add a reachability.py module with a Reachability class.
  • This class stores the no-fall-through-nodes, allows to reset them, has a visit(node) method that checks the node's type and depending on the type marks nodes as no-fall-through or checks whether their body is fully executable.
  • The Vulture class has a Reachability member and calls its visit(node) function, and retrieves the unreachable code from Reachability before switching to the next module.

This goes in the direction of adding a second NodeVisitor class, but avoids traversing the AST twice. Or do you think a second NodeVisitor class (with all of these visitor_* functions) would be a better alternative?

vulture/core.py Outdated
@@ -225,6 +227,10 @@ def scan(self, code, filename=""):
self.noqa_lines = noqa.parse_noqa(self.code)
self.filename = filename

# We can reset the fall_through_nodes for every module to reduce
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We can reset the fall_through_nodes for every module to reduce
# Reset the no-fall-through-nodes for every module to reduce memory usage.

vulture/core.py Show resolved Hide resolved
vulture/core.py Outdated

def _can_fall_through_statements_analysis(self, statements):
"""Report unreachable statements.
Returns True if we cannot fall though the list of statements
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns True if we cannot fall though the list of statements
Return True if we can execute the full list of statements.

@kreathon
Copy link
Contributor Author

kreathon commented Sep 3, 2023

This goes in the direction of adding a second NodeVisitor class, but avoids traversing the AST twice. Or do you think a second NodeVisitor class (with all of these visitor_* functions) would be a better alternative?

I do not see a clear advantage of one of the solutions and it probably does not matter too much right now (if the reachability is in a second module it should be really easy to switch between them anyway).

I will update the code according to your proposed code organization.

@kreathon
Copy link
Contributor Author

kreathon commented Sep 3, 2023

I am not sure if passing _define as report into Reachaiblity is the cleanest solution, but I did not want to create a new datastructure that passes the information back to the Vulture object. What do you think?

@kreathon
Copy link
Contributor Author

kreathon commented Sep 5, 2023

I had a look at it again, and I think it is cleaner to also move the _handle_conditional_node into the reachablitly.py (such that the module handles the entire self.unreachable_code LoggingList data structure).

I will propose the change as soon as I find time for it.

@kreathon
Copy link
Contributor Author

kreathon commented Sep 9, 2023

I updated

  • check_unreachable for checking multiple unreachable code segments (alternative would be to add another method for that or assert that there is only a single unreachable code segment separately)
  • move some tests from test_conditions.py into test_reachability.py
  • merge handle_condition_node() into Reachability to get more powerful analysis results. There are still cases that are not handled (we need to distinguish between raise/return and break/continue for that).

For example.

while True:
    raise Exception()
print(b)

I could handle this in another PR.

Copy link
Owner

@jendrikseipp jendrikseipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flew under my radar, sorry. I had three minor comments, but took care of them myself now. Thanks for your work on this, it not only generalizes the code, but also makes it nicer!

@@ -11,6 +12,7 @@
from vulture import utils
from vulture.config import InputError, make_config
from vulture.utils import ExitCode
from vulture.reachability import Reachability
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these in alphabetical order.

assert len(v.unreachable_code) == 1
item = v.unreachable_code[0]
assert item.first_lineno == lineno
def check_unreachable(v, lineno, size, name, multiple=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a Boolean in this way is a common code smell suggesting that we rather want two separate functions check_single_unreachable and check_multiple_unreachables.

self._report = report
self._no_fall_through_nodes = set()

def visit(self, node):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment: # All children of the node have already been visited.

@jendrikseipp jendrikseipp merged commit 609f5f2 into jendrikseipp:main Nov 24, 2024
19 checks passed
@jendrikseipp
Copy link
Owner

There are still cases that are not handled (we need to distinguish between raise/return and break/continue for that).

For example.

while True:
    raise Exception()
print(b)

I could handle this in another PR.

Feel free to do so, if you're still interested :)

@kreathon
Copy link
Contributor Author

kreathon commented Nov 24, 2024

This flew under my radar, sorry. I had three minor comments, but took care of them myself now. Thanks for your work on this, it not only generalizes the code, but also makes it nicer!

I actually thought recently about this PR again and if I should rebase and give it another go 😅

Feel free to do so, if you're still interested :)

Sure, I will have a look again 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants