Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for jupyter notebook #404

Closed
wants to merge 14 commits into from
Closed
3 changes: 2 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install coverage docopt yarg requests
pip install coverage docopt yarg requests nbconvert
pip install .[jupyter]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this .[jupyter] notation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

".[jupyter]" means that we are installing a library located in the current working directory and we are installing the optional depencies specified by the "jupyter" version. In this case, it means that we are installing pipreqs with support for jupyter notebooks.


- name: Calculate coverage
run: coverage run --source=pipreqs -m unittest discover
Expand Down
116 changes: 86 additions & 30 deletions pipreqs/pipreqs.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
<compat> | e.g. Flask~=1.1.2
<gt> | e.g. Flask>=1.1.2
<no-pin> | e.g. Flask
--ignore-notebooks Ignore jupyter notebook files.
"""
from contextlib import contextmanager
import os
Expand All @@ -48,9 +49,19 @@
from yarg import json2package
from yarg.exceptions import HTTPError

try:
PythonExporter = None
ignore_notebooks = False
from nbconvert import PythonExporter
except ImportError:
pass

Comment on lines +52 to +58
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I got this part. wouldnt this always set ignore_notebooks to false disregarding the cli option? and what is the python exporter and why are we just letting the execption pass?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, was the idea for it to be inside the exception handling? or are you making the existence of Python Exporter take precedence before the ignore notebooks flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the idea behind this whole block is to have PythonExporter declared regardless of nbconvert's import was successful or not. In this way, we detect whether the user has nbconvert installed in its machine. If it is installed, we will run pipreqs with nbconvert, meaning that we will detect jupyter notebooks.

ignore_notebooks is just a global variable, and I decided to declare it next to PythonExporter because they are correlated. However, the declaration of ignore_notebooks don't need to be inside this try except block (I can take it out of the block, if you deem it necessary).

The only possible flaw I see in this implementation is that, if the user decided to install pipreqs without the optional packages (in this case, support for jupyter notebooks) and it has nbconvert, the program will still detect jupyter notebooks, which may lead to some confusion. I implemented pipreqs in this way because I haven't found another method to detect the user's chosen installation option.

from pipreqs import __version__

REGEXP = [re.compile(r"^import (.+)$"), re.compile(r"^from ((?!\.+).*?) import (?:.*)$")]
REGEXP = [
re.compile(r"^import (.+)$"),
re.compile(r"^from ((?!\.+).*?) import (?:.*)$"),
]


@contextmanager
Expand Down Expand Up @@ -84,12 +95,21 @@ def _open(filename=None, mode="r"):
file.close()


def get_all_imports(path, encoding=None, extra_ignore_dirs=None, follow_links=True):
def get_all_imports(path, encoding="utf-8", extra_ignore_dirs=None, follow_links=True):
imports = set()
raw_imports = set()
candidates = []
ignore_errors = False
ignore_dirs = [".hg", ".svn", ".git", ".tox", "__pycache__", "env", "venv"]
ignore_dirs = [
".hg",
".svn",
".git",
".tox",
"__pycache__",
"env",
"venv",
".ipynb_checkpoints",
]

if extra_ignore_dirs:
ignore_dirs_parsed = []
Expand All @@ -102,13 +122,26 @@ def get_all_imports(path, encoding=None, extra_ignore_dirs=None, follow_links=Tr
dirs[:] = [d for d in dirs if d not in ignore_dirs]

candidates.append(os.path.basename(root))
files = [fn for fn in files if os.path.splitext(fn)[1] == ".py"]
if notebooks_are_enabled():
files = [fn for fn in files if file_ext_is_allowed(fn, [".py", ".ipynb"])]
else:
files = [fn for fn in files if file_ext_is_allowed(fn, [".py"])]

candidates = list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once again, maybe missing something, but why aren't we appending to the list of candidates anymore?

map(
lambda fn: os.path.splitext(fn)[0],
filter(lambda fn: file_ext_is_allowed(fn, [".py"]), files),
)
)

candidates += [os.path.splitext(fn)[0] for fn in files]
for file_name in files:
file_name = os.path.join(root, file_name)
with open(file_name, "r", encoding=encoding) as f:
contents = f.read()
contents = ""
if file_ext_is_allowed(file_name, [".py"]):
with open(file_name, "r", encoding=encoding) as f:
contents = f.read()
elif file_ext_is_allowed(file_name, [".ipynb"]) and notebooks_are_enabled():
contents = ipynb_2_py(file_name, encoding=encoding)
try:
tree = ast.parse(contents)
for node in ast.walk(tree):
Expand Down Expand Up @@ -145,11 +178,39 @@ def get_all_imports(path, encoding=None, extra_ignore_dirs=None, follow_links=Tr
return list(packages - data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_file_extensions():
return [".py", ".ipynb"] if PythonExporter and not ignore_notebooks else [".py"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to not implement this function because it makes the file extensions used by the program "hidden". The user will only know the file extension being used in the program if the user reads the function directly, making our code a little bit less readable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' t really have an opinion on this one, but in the future I would like to add support to .pyw (afaik, works the same as .py so simply adding it to the list of extensions should work) and any other stupid niche format that exists. Do whatever you will taking this into account


def notebooks_are_enabled():
return PythonExporter and not ignore_notebooks


def file_ext_is_allowed(file_name, acceptable):
return os.path.splitext(file_name)[1] in acceptable


def ipynb_2_py(file_name, encoding="utf-8"):
"""

Args:
file_name (str): notebook file path to parse as python script
encoding (str): encoding of file

Returns:
str: parsed string

"""

exporter = PythonExporter()
(body, _) = exporter.from_filename(file_name)

return body.encode(encoding)


def generate_requirements_file(path, imports, symbol):
with _open(path, "w") as out_file:
logging.debug(
"Writing {num} requirements: {imports} to {file}".format(
num=len(imports), file=path, imports=", ".join([x["name"] for x in imports])
num=len(imports),
file=path,
imports=", ".join([x["name"] for x in imports]),
)
)
fmt = "{name}" + symbol + "{version}"
Expand Down Expand Up @@ -199,7 +260,7 @@ def get_imports_info(imports, pypi_server="https://pypi.python.org/pypi/", proxy
return result


def get_locally_installed_packages(encoding=None):
def get_locally_installed_packages(encoding="utf-8"):
packages = []
ignore = ["tests", "_tests", "egg", "EGG", "info"]
for path in sys.path:
Expand Down Expand Up @@ -240,7 +301,7 @@ def get_locally_installed_packages(encoding=None):
return packages


def get_import_local(imports, encoding=None):
def get_import_local(imports, encoding="utf-8"):
local = get_locally_installed_packages()
result = []
for item in imports:
Expand Down Expand Up @@ -428,25 +489,23 @@ def dynamic_versioning(scheme, imports):


def init(args):
global ignore_notebooks
encoding = args.get("--encoding")
extra_ignore_dirs = args.get("--ignore")
follow_links = not args.get("--no-follow-links")
ignore_notebooks = args.get("--ignore-notebooks")
input_path = args["<path>"]

if encoding is None:
encoding = "utf-8"
if input_path is None:
input_path = os.path.abspath(os.curdir)

if extra_ignore_dirs:
extra_ignore_dirs = extra_ignore_dirs.split(",")

path = (
args["--savepath"] if args["--savepath"] else os.path.join(input_path, "requirements.txt")
)
if (
not args["--print"]
and not args["--savepath"]
and not args["--force"]
and os.path.exists(path)
):
path = args["--savepath"] if args["--savepath"] else os.path.join(input_path, "requirements.txt")
if not args["--print"] and not args["--savepath"] and not args["--force"] and os.path.exists(path):
logging.warning("requirements.txt already exists, " "use --force to overwrite it")
return

Expand Down Expand Up @@ -477,17 +536,14 @@ def init(args):
# the list of exported modules, installed locally
# and the package name is not in the list of local module names
# it add to difference
difference = [
x
for x in candidates
if
# aggregate all export lists into one
# flatten the list
# check if candidate is in exports
x.lower() not in [y for x in local for y in x["exports"]] and
# check if candidate is package names
x.lower() not in [x["name"] for x in local]
]
difference = [x for x in candidates if
# aggregate all export lists into one
# flatten the list
# check if candidate is in exports
x.lower() not in [y for x in local for y in x['exports']]
and
# check if candidate is package names
x.lower() not in [x['name'] for x in local]]

imports = local + get_imports_info(difference, proxy=proxy, pypi_server=pypi_server)
# sort imports based on lowercase name of package, similar to `pip freeze`.
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
wheel==0.38.1
Yarg==0.1.9
docopt==0.6.2
nbconvert==7.9.2

5 changes: 5 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@
'docopt', 'yarg'
]

jupyter_requirements = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please eli5 how this works or link a reference haha

Copy link
Contributor Author

@Fernando-crz Fernando-crz Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a reference on how it works: https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#optional-dependencies

just noticed the program is still not working as intended, because if someone decides to install it without it's optional dependencies, it just crashes because there's no nbconvert library installed locally to import. We should fix this by detecting whether the user installed the optional dependencies or not in the program.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add instructions to read me about how to deal with these optional dependencies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if possible please reverse the behaviour so that by default the jupyter stuff is added and it is only excluded as a dependency when explicitly told to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if possible please reverse the behaviour so that by default the jupyter stuff is added and it is only excluded as a dependency when explicitly told to

It seems like there's no support for doing such thing. Check https://stackoverflow.com/questions/76376575/python-setuptools-exclude-dependencies-when-installing

What do you suggest doing in this case? Drop optional support? Maybe keep it as it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you suggest doing in this case? Drop optional support? Maybe keep it as it is?

@alan-barzilay

Copy link
Collaborator

@alan-barzilay alan-barzilay Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, sorry for the delay. If there is no support for this feature, there is no support for it, what can we do haha
But i liked that pip install --no-deps my_package workaround, if we only import nbconverter/ipython when we use them (aka when we don't pass the cli option to ignore notebooks), someone that does this should be able to tailor their dependencies without too much trouble. Maybe we should add a mention to this into the readme as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and, maybe I missed it but where do we use ipython directly? I found the import for nbvconvert, is it that ipython is a dependency of nbconvert? if thats the case, cant we omit it from the file and let nbconvert install its dependencies itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please test if we really need the ipython dependency for magics translation

'nbconvert', 'ipython'
]

setup(
name='pipreqs',
version=__version__,
Expand All @@ -34,6 +38,7 @@
include_package_data=True,
package_data={'': ['stdlib', 'mapping']},
install_requires=requirements,
extras_require={"jupyter": jupyter_requirements},
license='Apache License',
zip_safe=False,
keywords='pip requirements imports',
Expand Down
65 changes: 65 additions & 0 deletions tests/_data_notebook/magic_commands.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Magic test"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%automagic true"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ls -la\n",
"logstate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ls -la"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%automagic false"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ls -la"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
37 changes: 37 additions & 0 deletions tests/_data_notebook/markdown_test.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Markdown test\n",
"import sklearn\n",
"\n",
"```python\n",
"import FastAPI\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Empty file added tests/_data_notebook/models.py
Empty file.
Loading