Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodules not showing up for (native) extension modules #319

Open
robamler opened this issue Mar 2, 2021 · 6 comments · May be fixed by #318
Open

Submodules not showing up for (native) extension modules #319

robamler opened this issue Mar 2, 2021 · 6 comments · May be fixed by #318

Comments

@robamler
Copy link

robamler commented Mar 2, 2021

When running pdoc on an extension modules (aka, a native "C" extensions), the extension module's submodules don't show up in the documentation even though <TAB>-autocomplete in a Python REPL can find the submodules. This seems to be because pdoc searches for submodules by inspecting the source directory, which isn't available for extension modules.

I've proposed PR #318 to fix this issue. The proposed solution works but I'm not sure if it is safe enough to remove the old "source directory traversal" method. I'd appreciate guidance on completing the PR.

Expected Behavior

Running pdoc on a native extension module should generate documentation for the entire extension module, including its submodules.

Actual Behavior

Steps to Reproduce

The following steps generate a minimalistic native extension module in Rust that exhibits the problem. The language shouldn't matter though.

  1. Install a rust toolchain, see https://rustup.rs
  2. Create the following directory structure:
pyext/
├── Cargo.toml
└── src/
    └── lib.rs

with the following file contents:

  • Cargo.toml:
[package]
authors = ["Name <[email protected]>"]
edition = "2018"
name = "pyext"
version = "0.1.0"

[lib]
crate-type = ["cdylib"]

[dependencies]
pyo3 = {version = "0.13.2", features = ["extension-module"]}
  • src/lib.rs:
use pyo3::{prelude::*, wrap_pymodule};

/// Docstring of main module.
#[pymodule(pyext)]
fn init_main_module(_py: Python<'_>, module: &PyModule) -> PyResult<()> {
    module.add_wrapped(wrap_pymodule!(submodule))?;
    Ok(())
}

/// Docstring of submodule
#[pymodule(submodule)]
fn init_submodule(_py: Python<'_>, submodule: &PyModule) -> PyResult<()> {
    submodule.add("variable", 42)?;
    Ok(())
}
  1. Compile the extension module: cargo build
  2. Create a properly named symlink to the object file:
    • on Linux: ln -s target/debug/libpyext.so pyext.so
    • on Mac: ln -s target/debug/libpyext.dylib pyext.so
    • on Windows: rename target\debug\libpyext.dll to pyext.pyd
  3. Start a Python REPL from the directory containing the pyext.so file and verify that the submodule exists and can be found by tab completion:
$ python
Python 3.6.10 (default, May 22 2020, 17:59:48) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyext
>>> pyext.<TAB>  --> autocompletes to "pyext.submodule", proving that the submodule can be found
>>> pyext.submodule.variable
42
  1. Run pdoc --html pyext from the same directory.

Additional info

@kernc kernc linked a pull request Mar 3, 2021 that will close this issue
@kernc
Copy link
Member

kernc commented Mar 5, 2021

Thanks for an exemplary bug report!

Just to clarify: Step 5, when we import pyext, could we just as well have done:

>>> import pyext.submodule

# or

>>> from pyext.submodule import variable

Does this run?

@robamler
Copy link
Author

robamler commented Mar 7, 2021

Just tested it:

  • import pyext.submodule doesn't work;
  • from pyext.submodule import variable doesn't work;
  • however, from pyext import submodule works.
$ python
Python 3.8.2 (default, Mar  2 2021, 23:57:34) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyext.submodule
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package
>>> from pyext.submodule import variable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package
>>> from pyext import submodule
>>> submodule.variable
42

I think this is because the extension module pyext is compiled into a single binary file that cannot be loaded "in parts" (unlike regular modules, whose implementation is typically scattered across several source files). The python interpreter doesn't know about the submodules until it actively loads the pyext module, which (I think) it only does when you explicitly say either import pyext or from pyext import xxx.

In other words, I think from A import B actually loads A (but only brings A.B into scope, as B), so from pyext.submodule import variable would try to load pyext.submodule, which doesn't exist in the file system because it only gets generated "in memory" when you load pyext.

@kernc
Copy link
Member

kernc commented Mar 7, 2021

That's exactly why I asked because I remembered resolving to wontfix about a similar issue just recently. See my thoughts in #252 (comment). The simple fact is:

ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package

pyext.submodule is not a module to have stuff imported from, so I'm hesitant to make pdoc list it as such.

Can you investigate if you can set .__package__ and .__path__ attributes (or whatever is necessary to interpret Python module as a package) upon the relevant package/module objects and if maybe that automatically does something?

@robamler
Copy link
Author

robamler commented Mar 7, 2021

Thank you for the explanation! Unfortunately, setting .__package__ and .__path__ in the extension module doesn't help.

I respect your decision if you don't want to address this. I'd just like to raise two counter arguments for your consideration. First, I think this issue will probably affect a lot of people (probably all authors of native extension modules that don't find some sort of workaround). Second, I am interpreting the ModuleNotFoundError in a different way. In fact, I get the same error message when I try to import, e.g., pdoc:

>>> import pdoc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pdoc'

The reason is that I'm in a python environment where pdoc isn't installed, and that's why the python interpreter can't find it and thus throws a ModuleNotFoundError. So, even though pdoc definitely is a module (it's even a package), it just can't be found at the moment. But as soon as you bring it in your sys.path, it can be found. I'd argue that the situation for pyext.submodule is quite similar: it is a module, it just can't be found at the moment. But as soon as you import pyext (which is the package on which I want to run pdoc anyway), then pyext.submodule can be found (and is recognized as a module):

>>> import pyext
>>> type(pyext.submodule)
<class 'module'>

I agree that pyext.submodule is not a package (e.g., it doesn't have a .__path__ set), but I think that shouldn't make a difference.

@kernc
Copy link
Member

kernc commented Mar 8, 2021

it just can't be found at the moment

That's correct. That's why Python has sys.path_hooks (to maybe provide a suitable finder/loader for a given package) and sys.meta_path, which is a list of already registered default finders.

Following related upstream issues:

I think PyO3 might wish to provide a finder akin to the one removed in PyO3/pyo3@8d14568 (briefly discussed in PyO3/pyo3#1269 (comment)), and add it to sys.meta_path upon loading the top-level extension module.
This way, both pdoc pyext as well as >>> import pyext.submodule would work flawlessly, and it'll be justified to call pyext.submodule a package and its submodule (instead of merely a variable pointing to a module object such as with >>> import re as my_re).

I'd just hate to have pdoc's deviate from the Python's interpretation of stuff.

Then again, we do check in #318 that the object is present in __all__, so the intent is visible, and there's little utility in documenting modules containing further objects as mere variables. The end-user will be confused that they can't:

from your_pyext.submodule.nested import Something

But that's not really our problem ... 🤔

@kernc
Copy link
Member

kernc commented Mar 28, 2021

There's apparently a workaround described in PyO3/pyo3#1517 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants