-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix getProcessedModule() for package #262
Fix getProcessedModule() for package #262
Conversation
Codecov Report
@@ Coverage Diff @@
## master #262 +/- ##
==========================================
+ Coverage 81.49% 81.51% +0.02%
==========================================
Files 25 25
Lines 3701 3716 +15
Branches 780 781 +1
==========================================
+ Hits 3016 3029 +13
- Misses 472 474 +2
Partials 213 213
Continue to review full report at Codecov.
|
This PR causes some regressions when processing Twisted:
So I'll have to dig deeper. |
It looks like the handling of
The name resolving is done using the From running some tests on Python 3.8, I think it should work like this:
Note that |
Perhaps the delegation happens in the wrong direction: if you Note that Since pydoctor is doing static analysis, we probably should treat all modules that can be imported as having been imported already, which means that submodules should be given priority over local names in the |
This provides mypy with enough information to find the problem where getProcessedModule() returns a Package, which lacks the 'all' attribute that visit_ImportFrom() needs.
When called with a package name, getProcessedModule() would force the __init__ module to be processed, but then returned the Package object instead of the Module object. This is contrary to what its name suggests and what visit_ImportFrom() expects. Closes twisted#259
The builtins don't exist at the module level, as can be seen when you attempt to import a builtin like `int` from an arbitrary module. A module does have a __builtins__ dictionary, but those names form a scope above the module level; they're not names in the module. Since the following `else:` case returns the same value, this removal will not affect the behavior in any way.
This method was rather difficult to read, in particular the code for handling "from module import *" and "from module import names", which had little overlap in functionality but was intertwined in the method body. There should be no changes in functionality.
This test case fails when run with the old implementation of getProcessedModule().
There were a lot of checks that were redundant or that were needlessly done inside the loop. The overwriting of the 'mod' variable was particularly nasty, since it would put a package there if we're importing from an __init__ module, and then following names would be looked up in the package instead.
The outcome won't change across iterations, so it is more efficient to do it in advance.
Allow specifying the parent name when parsing a module, such that the module's parent will be set correctly. This is necessary when we want to test name lookups that cross module boundaries.
This check got added without explanation in c17fe00. That commit was part of the Python 3 porting effort, but this particular change doesn't seem to be directly related to that. Maybe it helped unit tests pass at the time, but in that case it was likely masking a bug elsewhere rather than fixing things, since the parent's full name should never be equal to the child's full name.
Using dict.setdefault(), we can avoid computing the full name's place in the dictionary twice.
2a2c915
to
4b9ef9c
Compare
I think I got the implementation correct now, but I still don't have a unit test case that demonstrates the regression. |
4b9ef9c
to
7f94ad3
Compare
This new test fails because of a regression introduced when getProcessedModule() started returning the __init__ module.
This fixes a regression introduced when getProcessedModule() started returning the __init__ module. A name imported from a package can be a module/subpackage, or a name from the __init__ module. The former only works when we look up the name in the package, while the latter will work whether we look up in the package or the __init__ module, since the package will delegate to the __init__ module when necessary.
7f94ad3
to
6cb337f
Compare
I managed to construct a test case that demonstrates the regression. |
pydoctor/test/test_astbuilder.py
Outdated
|
||
@systemcls_param | ||
def test_import_module_from_package(systemcls: Type[model.System]) -> None: | ||
"""Importing a module from a package should not look in __init__ module.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this is just a regression tests ?
"""Importing a module from a package should not look in __init__ module.""" | |
""" | |
Importing a module from a package should not look in __init__ module. | |
In this test the following hierarchy is constructed: | |
a/__init__.py | |
a/b.py | |
c.py | |
When c.py has content as below, `f` is resolved as a member if `b.py`: | |
from a import b | |
f = b.f | |
""" |
Or maybe change the name of f
to a name that describe its purpose in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this is just a regression tests ?
In the sense that this is the test case that demonstrates the regression that was introduced halfway the PR, yes. But it also tests a fundamental property of the from ... import
handling, so in that sense it's more of a test that was missing instead of one that specifically exists to check for a regression.
I'll expand the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
Sorry... I am not able to do a good review now.
I don't know too much about AST and epydoc in general.
I am not sure why we need the Package class... and why can't we just have a
Module.isPackage() method or Module.is_package attribute that is True when the module name is __init__
If you can't find another reviewer, I think that this is an improvement over the current state and it should be merged.
My suggestion is to update the system tests based on this diff... the idea is that we can use cpython code as a non-Twisted playground.
diff --git a/.github/workflows/system.yaml b/.github/workflows/system.yaml
index c54fc4a..236795b 100644
--- a/.github/workflows/system.yaml
+++ b/.github/workflows/system.yaml
@@ -32,3 +32,7 @@ jobs:
- name: Generate Twisted API docs
run: |
tox -e twisted-apidoc
+
+ - name: Generate CPython API docs
+ run: |
+ tox -e cpython-apidoc
diff --git a/tox.ini b/tox.ini
index 425c822..0b852f9 100644
--- a/tox.ini
+++ b/tox.ini
@@ -51,6 +51,17 @@ commands =
twisted-apidoc: /bin/cat {toxworkdir}/twisted-apidocs.log
+[testenv:cpython-apidocs]
+description = Build CPython API documentation (for the supported modules)
+
+commands =
+ rm -rf {toxworkdir}/cpython
+ git clone --depth 1 --branch master https://github.com/python/cpython.git {toxworkdir}/cpython
+ pydoctor --add-package={toxworkdir}/cpython/Lib/ctypes --project-base-dir={toxworkdir}/cpython
+ pydoctor --add-package={toxworkdir}/cpython/Lib/tkinter --project-base-dir={toxworkdir}/cpython
+ pydoctor --add-package={toxworkdir}/cpython/Lib/test --project-base-dir={toxworkdir}/cpython
+
+
[testenv:mypy]
description = run mypy (static type checker)
pydoctor/model.py
Outdated
mod = self.allobjects.get(modname) | ||
if mod is None: | ||
return None | ||
if isinstance(mod, Package): | ||
return self.getProcessedModule(modname + '.__init__').parent | ||
initModule = self.getProcessedModule(modname + '.__init__') | ||
assert initModule is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "out of test code" assertion required ?
I was hoping that we now have enough unit tests to see them fail if this is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assert was added at some point to tell mypy that initModule
was safe to call Module
methods on, but in the method's final form there is nothing called on it, so I'll remove it.
super().setup() | ||
self.all = None | ||
self.all: Optional[Collection[str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would help to add as a documentation comment all that you know/think so far that all
should do.
I see that it is used in astbuilder and it looks like it contains the members define in the `all`` member of a module.
Also, by reading the code it was not clear why we need a Package and a Module inside model.py
It looks like Package is some sort of "wrapper" for the init module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about more documentation, but that's a problem we'll have to solve bit by bit, since there is quite an amount of technical debt. That said, now is a better time than average to pay off this particular piece of debt, so I'll add a docstring.
|
||
# Are we importing from a package? | ||
is_package = isinstance(obj, model.Package) | ||
assert is_package or obj is mod or mod is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor question of style... do we need the assertions.
Also, for me is_package
is a great variable name and the comment is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assertion is useful, as it makes a non-trivial statement about the code. This is illustrated by the fact that originally I forgot about the or mod is None
part, which can happen if there is a name clash between a module and a variable in __init__.py
within the same package.
One reason to use asserts in situations like this is that it acts as a formal and verified statement about the code. If the same statement were included as a comment, there would be more room for misinterpretation and if the statement is not or no longer true, we wouldn't know.
Another reason to use asserts is that they're active across all unit and system tests, so they help cover a much larger state space. Despite having around 90% branch coverage, there are still quite a few bugs in the astbuilder
and model
modules that we're not catching, because either the test cases don't do enough validations or a bug only occurs in particular combinations of states that are tested separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. We can keep it.
Another matter of style: Move is_package = isinstance(obj, model.Package)
on line 250 so that it is closer to the place where it is used.
Will make the code easier to read...
I was searching for placer between line 211 and 250 to see in which other place it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 250 is inside the loop, while is_package
does not depend on the thing we're iterating through. So I don't think it should be inside the loop, both for clarity and for performance.
I thought about moving the re-export code to a separate method, since that takes up the bulk of the loop's body, but doing that would create a method with 3 arguments (mod
, orgname
and asname
), so it I'm not sure it would improve readability.
Maybe @moshez would be willing to have a look?
I was about to disagree with packages being modules, but then looked up the definition of a package and it turns out that packages are in fact special modules. So it would make sense to change pydoctor's model as you describe. But that would be quite invasive, so I'd rather land this PR first.
I agree: even if the old/current design doesn't match the language definition exactly, the behavior is now more correct and better tested than it was before. It fixes the three crashes from #259 and it might help with #184 as well (unverified).
I think it's useful to have more system tests than just Twisted, also because Twisted doesn't use RST. On the other hand, we have to keep the CI cycle fast, so I'm a bit hesitant to include a project as large as CPython. Would it be an option to run it as a post-merge test? Then we can see in practice how often the test will find issues we wouldn't otherwise have found, without slowing down CI in the mean time. |
It took me 27 second to At this point, we don't need to run pydoctor on the full cpython...but just on the parts that failed in the past... to have it as a regression test. What we can/should do, is run cpython-apidocs in parallel with twisted-apidocs. twisted-apidocs are already taking close to 2 minutes... so as long as we run cpythong-apidocs in parallel and they take less than 2 minutes to run, we don't increase the total test time Diff to run system tests in parallel can look like this: diff --git a/.github/workflows/system.yaml b/.github/workflows/system.yaml
index c54fc4a..d6aa2db 100644
--- a/.github/workflows/system.yaml
+++ b/.github/workflows/system.yaml
@@ -10,6 +10,10 @@ jobs:
system_tests:
runs-on: ubuntu-latest
+ strategy:
+ matrix:
+ tox_target: [twisted-apidoc, cpython-apidoc]
+
steps:
- uses: actions/checkout@v2
@@ -31,4 +35,4 @@ jobs:
- name: Generate Twisted API docs
run: |
- tox -e twisted-apidoc
+ tox -e ${{ matrix.tox_target }} |
Are regressions more likely to come from these three than from other parts of the code though? Now that we have mypy coverage of that part of the code, hitting an unexpected
That is true: as long as it doesn't take longer than twisted-apidocs, it's essentially free. I'll add your suggestions, but I'd like to try and see how long the full cpython would take. Or maybe we can pick some random files plus a list of files that previously failed. |
By the way, could we put the cpython source in a cached location? Then we would not need to do a full clone every time. |
I think that you can try and see how much faster is caching the cpython code... the problem is what key to invalidate the cache. I think that you can also run the full cpython documentation... including tests. I just went with those 3 packages as those were the one triggering these failures.... and I didn't wanted to make the test run longer.... The ticket #259 has a list of modules ... One option is to convert the whole cpython/Lib into a package via |
I'm also running into a problem: if there are docstring syntax errors, pydoctor currently returns exit code 2. Should we consider docstring syntax errors as errors, since the output is likely wrong? Or should we consider them warnings, as we are still able to generate the full output even though it may be wrong in places? Unlike warnings, in these cases we're certain that the input is wrong, but like warnings, the problem has a limited impact on the end result. In any case, we shouldn't be using exit code 2: it should either be 0/3 (depending on We're getting syntax errors reported because we're not selecting RST as the format yet, but even if we do that, there is the question of how to treat docstring errors should they exist. In any case, it's a content problem and it shouldn't fail the system test, but the question is whether pydoctor should return 0 or whether we filter the exit code in the tox rule and only fail the test on exit code 1. |
Do we ever need to invalidate the cache? Maybe if it became corrupted, but under normal circumstances we could just |
It seems you got a 2 seconds run time because only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running the following command based on this branch and it still fails :(
pydoctor --add-package=cpython/Lib/test --project-base-dir=cpython
On my laptop, just running took I had to use this hack to
This can help just with the AST traversal. Using So my suggestion is to still run pydoctor on cpython, but at this stage without RST... just to check all the imports |
In what way does it fail? It runs fine here. |
Are you sure you're running the right version of pydoctor? Because that sounds like the original bug from #259 that was fixed.
Yes, the problem from #261 is still there.
In that case, I think we should add a command line option to disable the output generation altogether, since that is where the bulk of the time is spent. Parsing all of CPython takes on 24 seconds on my PC, while generating output takes about a minute extra. Surprisingly, it doesn't matter all that much which docformat I select: edit: Actually the existing |
Based on a fragment from Adi Roiban in twisted#262 discussion. This test currently fails on the problem reported as twisted#261.
Since a full run takes too long to comfortably fit into a CI cycle, we pass `--html-summary-pages` to only generate a summary report. This does excercise the majority of the parsing code. Since pydoctor currently crashes when parsing the actual docstrings, we temporarily skip docstring parsing by selecting the `plaintext` docformat. See twisted#261 for details.
Based on fragment from Adi Roiban in the twisted#262 discussion.
The new system test passes in CI, so I think that the problem you ran into is specific to your setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Thanks!
I think that this is ready to be merged.... it is a huge improvement (running the regression AST traveral on cpython. I am happy to see the new tox env, as they allow us to re-run cpython in an easy to reproduce way. Thanks! |
When called with a package name,
getProcessedModule()
would force the__init__
module to be processed, but then returned thePackage
object instead of theModule
object. This is contrary to what its name suggests and whatvisit_ImportFrom()
expects.I also added type annotations to narrow down the place where the wrong type originated.
The purpose of this PR is to fix the bug described in #259.