Making River type-complete #1592

e10e3 · 2024-08-19T18:01:15Z

e10e3
Aug 19, 2024

The pitch

River has some type annotations and recommends that new code does not cause new type-checking errors with MyPy.

Not all the code in River is annotated: running MyPy with --disallow-incomplete-defs --disallow-untyped-defs raises over 2500 errors. And this does not count the type inconsistencies that MyPy would signal in fully-typed code, but it currently ignores while there are no types.

To signal to type checkers that they ship type annotations, River’s modules have a py.typed file.

The Python guidelines for type checking libraries say, “A 'py.typed' library should aim to be type complete so that type checking and inspection can work to their full extent.”

Indeed, partial typing can hinder efforts to type check programs that depend on the library.

River is not there yet, but we can work towards it!

To incite in having correct and complete type annotations, we need to adapt the configuration in pyproject.toml. The issue “Professional-grape mypy conf” (#1430) gives some pointers.

The workload is too big for it to be done in a single PR. Thankfully, MyPy is designed for this use case. We can progress until the library is fully covered.

This means regular type-checking with MyPy will continue to work like it does today, without an avalanche of errors, and type checking will be able to progress in parallel.

There are two mains ways of reaching this goal: going module by module or rule-by-rule. They are not incompatible.

Modules

The module way could be to set strict=true by default in the configuration and an override with strict=falsefor all pending modules. When a module is fully annotated, remove it from the overrides.

A hierarchy of modules should be annotated first can be done as well, depending on how much they are imported (or estimated to be). I propose:

base
metrics
utils
stats
optim
datasets
tree
preprocessing
stream
linear_model
evaluate
drift
compose
bandit
cluster
anomaly
time_series
feature_extraction
ensemble
proba
multioutput
naive_bayes
checks
rules
model_selection
forest
neighbors
sketch
facto
covariance
compat
multiclass
reco
imblearn
feature_selection
misc
active
conf
neural_net

Rules

Another way is to enable progressively stricter options (https://mypy.readthedocs.io/en/stable/existing_code.html#introduce-stricter-options): at first, only the most lenient checks from MyPy are enabled, and stricter options are progressively enabled. This can also be broken to a per-module basis.

MyPy has an example for this in its documentation:

# Start off with these
warn_unused_configs = True
warn_redundant_casts = True
warn_unused_ignores = True

# Getting these passing should be easy
strict_equality = True
strict_concatenate = True

# Strongly recommend enabling this one as soon as you can
check_untyped_defs = True

# These shouldn't be too much additional work, but may be tricky to
# get passing if you use a lot of untyped libraries
disallow_subclassing_any = True
disallow_untyped_decorators = True
disallow_any_generics = True

# These next few are various gradations of forcing use of type annotations
disallow_untyped_calls = True
disallow_incomplete_defs = True
disallow_untyped_defs = True

# This one isn't too hard to get passing, but return on investment is lower
no_implicit_reexport = True

# This one can be tricky to get passing if you use a lot of untyped libraries
warn_return_any = True

What are your thoughts on this?

I would personally tend to go for the module way, it makes for more localised edits instead of working over the whole codebase again and again.
But as said above, the two approaches are not incompatible.

smastelini · 2024-08-19T19:03:37Z

smastelini
Aug 19, 2024
Maintainer

Hi @e10e3, good stuff. The first option sounds good to me and reminds me of the first annotation efforts in 2022. Back then, we assigned the team to annotate the code accordingly with familiarity with each module. It is a good way to parallelize efforts.

0 replies

MaxHalford · 2024-08-20T07:19:12Z

MaxHalford
Aug 20, 2024
Maintainer

Thanks for looking back into this @e10e3. I agree that we could enable strict typing progressively on a module basis.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making River type-complete #1592

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Making River type-complete #1592

e10e3 Aug 19, 2024

The pitch

Modules

Rules

Replies: 2 comments

smastelini Aug 19, 2024 Maintainer

MaxHalford Aug 20, 2024 Maintainer

e10e3
Aug 19, 2024

smastelini
Aug 19, 2024
Maintainer

MaxHalford
Aug 20, 2024
Maintainer