Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable professional-grade mypy #129

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

glatterf42
Copy link
Member

@glatterf42 glatterf42 commented Nov 12, 2024

Finally, here it is!
After reading this blog post about "professional-grade" mypy, I began adopting the professional-grade config for ixmp4. This PR is the result of the process.
It type hints everything, though some type hints might still need relaxation or tightening. Even our test suite is type-hinted here. This allows us to trust all variables we're dealing with have the correct type. In particular, refactoring can be done with improved confidence as mypy would catch a wide variety of bugs if the refactoring doesn't fit the existing code. Having these type hints also helps with readability, I think, as it is one kind of documentation in itself. I found this process especially insightful for understanding how the filters work :)
Of course, this PR also bumps mypy to its latest version.

That being said, there are a few points where I think we can still improve the PR, or add improvements through a future PR:

  1. For the create() functions in the DB layer, I often kept the *args and **kwargs since the facade layer often uses positional arguments, while the rest/api layer uses keywords. This is different from tabulate() and list(), which are generally only called via enumerate() and with kwargs, so this would be a place to streamline the signatures.
  2. In data/abstract/meta, I needed to type hint the _type_map, for which I needed to use the built-in type. This led to confusion for mypy since meta already had an attribute called type. I have renamed the attribute to dtype to enable the type hint.
  3. For most kwargs, I have learned that the best way to type hint them is by creating a TypedDict subclass, which explicitly spells out all possible kwargs and their types. I have not done so for some /base/ functions, which mainly act as templates or pass on all *args, **kwargs that they receive. When args or kwargs were unused, I sometimes removed them from the signature and type-hinted them as Any at other times, depending on how I understood the function's purpose. So there are currently two approached visible, which come down to different understandings of type hints, I suppose:
    On the one hand, one could try to spell out everything explicitly how it is currently used. This improves the documentation aspect and enables mypy to catch more bugs before they reach production. On the other hand, this would block newly introduced kwargs until they are added to these kwargs-classes, which increases maintenance burden. Alternatively, one could convey the purpose of the function through the type hints, being more general. The pros and cons are essentially the inverse of the above.
    For code that I didn't write, it's harder to correctly guess the purpose of functions, so I have generally opted more for the first approach. However, I'm open to discussion here.
  4. Where I did write kwargs-classes, they are sometimes not organized as well as they could, I feel. In the DB layer, for example, EnumerateKwargs could often inherit from CreateKwargs at the moment. For the filter classes, spelling out everything like name__like is also lengthy, we could e.g. think of using mixins there.
  5. For some /base/ functions that pass on their arguments, I used Any because I couldn't get the alternative to work: in theory, this sounds like a job for ParamSpec: "They are used to forward the parameter types of one callable to another callable". However, I couldn't figure out exactly how to do that. On that note, if you know a good tutorial on ParamSpec, I'd be happy to read it as this almost seems like magic to me right now.

Lastly, I applied some type: ignore markers where I couldn't quite understand why mypy was complaining or where I intend to refactor the code very soon, anyway. In principle, though, almost none of that should be necessary, strictly-speaking. "Code I want to change soon" mainly refers to the DB model for parameters, equations, and variables, where I expect a more performant model can make do without the whole column thing. And I would maybe also drop table entirely since this is used nowhere in the tutorials. This is less certain, however, which is why I did type hint table, still.

Note

As so nicely displayed by codecov, we don't have 100% test coverage yet (that's another juicy small project goal...). So when reviewing this, please use the branch like main might be used to ensure that things we don't test did not get broken. In particular, this probably means running some CLI commands from this branch.

@glatterf42 glatterf42 added the enhancement New feature or request label Nov 12, 2024
@glatterf42 glatterf42 self-assigned this Nov 12, 2024
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 95.11213% with 85 lines in your changes missing coverage. Please review.

Project coverage is 86.9%. Comparing base (362d503) to head (c18ed42).

Files with missing lines Patch % Lines
ixmp4/cli/platforms.py 0.0% 13 Missing ⚠️
ixmp4/data/generator.py 0.0% 9 Missing ⚠️
ixmp4/data/db/iamc/timeseries/repository.py 78.1% 7 Missing ⚠️
ixmp4/db/filters.py 83.7% 7 Missing ⚠️
ixmp4/conf/manager.py 73.3% 4 Missing ⚠️
ixmp4/conf/settings.py 84.0% 4 Missing ⚠️
ixmp4/data/db/timeseries.py 87.0% 4 Missing ⚠️
ixmp4/server/rest/deps.py 69.2% 4 Missing ⚠️
ixmp4/core/iamc/data.py 78.5% 3 Missing ⚠️
ixmp4/data/api/base.py 95.8% 3 Missing ⚠️
... and 19 more
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #129     +/-   ##
=======================================
+ Coverage   85.9%   86.9%   +1.0%     
=======================================
  Files        227     228      +1     
  Lines       7583    8141    +558     
=======================================
+ Hits        6514    7081    +567     
+ Misses      1069    1060      -9     
Files with missing lines Coverage Δ
ixmp4/conf/__init__.py 100.0% <100.0%> (ø)
ixmp4/conf/credentials.py 100.0% <100.0%> (ø)
ixmp4/conf/toml.py 97.9% <100.0%> (ø)
ixmp4/core/__init__.py 100.0% <100.0%> (ø)
ixmp4/core/base.py 93.3% <100.0%> (ø)
ixmp4/core/decorators.py 100.0% <100.0%> (ø)
ixmp4/core/exceptions.py 95.4% <100.0%> (+<0.1%) ⬆️
ixmp4/core/iamc/__init__.py 100.0% <100.0%> (ø)
ixmp4/core/iamc/variable.py 84.3% <100.0%> (+0.8%) ⬆️
ixmp4/core/meta.py 100.0% <100.0%> (ø)
... and 184 more

@glatterf42 glatterf42 linked an issue Nov 12, 2024 that may be closed by this pull request
@glatterf42
Copy link
Member Author

If we adopt this PR, we should also present a mypy badge in our readme :)

pyproject.toml Outdated
plugins = ['numpy.typing.mypy_plugin', 'pandera.mypy', 'pydantic.mypy', 'sqlalchemy.ext.mypy.plugin']
disallow_untyped_defs = true
disallow_any_unimported = true
# disallow_any_generics = true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: while working on this PR, I stumbled upon the setting disallow_any_generics = true. It is not active as of now, as that would introduce a few hundred more errors, but it would be an even stricter rule we could employ (to follow the spirit of using the strictest possible settings for mypy). This rule would prevent us from returning dict or list somewhere: all such generics would need to be explicitly specified as list[int] or some such. Happy to hear your thoughts on whether we should do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are already at it and its a comparatively small amount of work I would tend to go ahead! "wenn schon denn schon"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion, this PR enables now all strict mypy rules except for no_implicit_reexport, which we discussed in person and wanted to keep for another PR as it might change the API.

Copy link
Contributor

@meksor meksor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, what a feat! Thanks a lot, this is very cool.
I couldnt find what you mentioned in point 5. It sounds like that should be possible, but I've seen ParamSpec for the first time in this PR and couldnt tell you exactly what to do..
I also left some comments inline.

ixmp4/data/abstract/iamc/datapoint.py Show resolved Hide resolved
name__notlike: str
name__notilike: str
iamc: (
dict[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not solve this with a more general, recursive type-hint?

For example:

FilterDict =  dict[
                str,
                int
                | str
                | Iterable[int]
                | Iterable[str]
                | "FilterDict"
]

ixmp4/data/api/meta.py Outdated Show resolved Hide resolved
ixmp4/data/abstract/iamc/variable.py Outdated Show resolved Hide resolved
ixmp4/data/abstract/iamc/timeseries.py Outdated Show resolved Hide resolved
ixmp4/data/api/base.py Show resolved Hide resolved
@glatterf42
Copy link
Member Author

Okay, the update is here :)
It should clear the tests and mypy now, which has almost become strict (only missing no_implicit_reexport, which mypy's docs describe as "not too hard to get passing, but return on investment is lower".

I still need to do some cleanup, I want to look into your suggestion of recursive hints (and maybe employ some TypeAliases) in particular, but right now, I'll have to take a step back and check if we can already support Python 3.13 :)
After I do the cleanup, I'll still raise any remaining questions explicitly.

Copy link
Member Author

@glatterf42 glatterf42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took longer than expected again, but here it is: calling out every single thing I had a question about or an issue with! I'll push one or two more clean-up commits with things I noticed while going through all files, but after that, I'm happy with this PR :)

if res.status_code != 200:
raise ManagerApiError(f"[{str(res.status_code)}] {res.text}")
return res.json()
# TODO Can we really assume this type?
return cast(dict[str, Any], res.json())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work based on all test we have, but I'm not sure if this is actually true since res.json() is just annotated to return Any. I think we can leave this, but please let me knof if this should be tested in some way.

# TODO can't find a similar open issues, should I open one?
# For reference, this occurred after adding disallow_subclassing_any
# NOTE mypy seems to say Exception has type Any, don't see how we could change that
class RemoteExceptionMeta(ExcMeta): # type: ignore[misc]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy complained here about subclassing from Any, which I don't see how we could avoid if Exception really inherits from that. But in that case, I would have expected this to be fairly common and noted in some mypy issue, but I couldn't find any. Should I open one or do you know what's going on here?

return None
elif isinstance(variable, str):
# TODO this check seems kind of redundant...
if isinstance(variable, str):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems kind of redundant since variable can only be a string, anyway. But I didn't know what to do about the error: would we instead want to raise some error if the variable name is unknown?

return None
elif isinstance(model, str):
# TODO this check seems kind of redundant...
if isinstance(model, str):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for iamc/variable: what should we do about this error message when removing the if-check?

return None
elif isinstance(scenario, str):
# TODO this check seems kind of redundant...
if isinstance(scenario, str):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another redundant check: what do we do with the error?

@@ -84,6 +87,7 @@ def add(

@guard("view")
def get(self, run_id: int, name: str) -> Variable:
# TODO by avoiding super().select, don't we also miss out on filters and auth?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I noticed while working on this PR, likely warrants its own follow-up issue.

# raise ProgrammingError(
# "Field argument `lookups` must be `list` of `str`."
# )
override_lookups = _ensure_str_list(jschema_lookups)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution isn't ideal, as mentioned in the TODO comment above, but it seems to never be called (at least by our test suite), so the performance impact should be negligible. Please raise better ideas, though :)



def deepgetattr(obj, attr):
# TODO How to type hint this to return Repos with .list() and .count()?
def deepgetattr(obj: Backend, attr: str) -> Any:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type hint is not great, but I don't know how to make it better.

platform: ixmp4.Platform,
# TODO: improve type hints for fixtures
profiled: Any,
benchmark: Any,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't figured out how to properly annotate these fixtures. Do you know? If not, should we figure this out or accept them as they are now?

Comment on lines 101 to 121
# TODO The order is important here at the moment, in particular having int before
# float! This should likely not be the case, but using StrictInt and StrictFloat
# from pydantic only created even more errors.
data: (
list[
list[
bool
| datetime
| int
| float
| str
| dict[str, Any]
# TODO should be able to remove this once PR#122 is merged
| list[float | int | str]
| None
]
]
| None
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type hint is not great as explained in the comment. I think I really don't like pydantic using the order in type hints to coerce types, because that's something that I wouldn't expect. Here, it seems to be important that int come before float. Similar to meta, I tried using StrictInt etc, but this broke some other tests. Please let me know if you have a good idea for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix type hinting for optimization.indexset
2 participants