Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type annotations and more tests #61

Closed
wants to merge 42 commits into from
Closed

Conversation

lebrice
Copy link
Collaborator

@lebrice lebrice commented Oct 6, 2023

No description provided.

@lebrice lebrice changed the title Add type annotations and unit tests Add type annotations and more unit tests Oct 6, 2023
@lebrice lebrice changed the title Add type annotations and more unit tests Add type annotations and more tests Oct 6, 2023
@lebrice lebrice marked this pull request as ready for review October 11, 2023 18:44
Copy link
Member

@satyaog satyaog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm feeling less and less comfortable with the master branch being deeply modified (with the new _* default funcs). It doesn't change anything for the end user and prevents us from quickly bringing a fix to users or simply fix a dependency version. But we're too far in the changes now

Comment on lines 22 to 24
_check_output_fn: Callable[
Concatenate[str | Sequence[str], P], str
] = subprocess.check_output,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow mocking right? This doesn't look very clean to me but I see why this could be time consuming to find a solution so maybe we could add a # TODO: This is to ease mocking in test. Find a cleaner way that doesn't involves a _check_output_fn exposed to the user

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seams it's not used in mocks so I would remove this from the exposed arguments list unless I'm not seeing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mock subprocess.check_output directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is mainly for type-checking purposes. We don't want to allow passing invalid **kwargs down to subprocess.check_output. This indicates to the type checker that this method has the same signature as subprocess.check_output (the function it is wrapping), and only accepts these arguments.

The alternative would be to remove the **kwargs and explicitly duplicate all the arguments of subprocess.check_output (and their default values) that in all the places where we use this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the ParamSpecs and made the signatures explicit in 1a6ecd5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no other way? Type checking requires adding a function's argument? Would something like this be possible?

return subprocess.check_output: Callable[
    Concatenate[str | Sequence[str], P], str
](cmd, *args, **kwargs)

Copy link
Collaborator Author

@lebrice lebrice Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I don't quite understand the question, and no that's not possible. Perhaps we could chat about this in person?

Comment on lines 37 to 58
raise
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise e changes the stacktrace, in opposition of raise. Probably better to keep only raise

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dcb067c

Comment on lines +329 to +348
class Choice(qn.Choice, Generic[_T]):
value: _T

def __init__(
self,
title: FormattedText,
value: _T | None = None,
disabled: str | None = None,
checked: bool | None = False,
shortcut_key: str | bool | None = True,
) -> None:
super().__init__(
title=title,
value=value,
disabled=disabled,
checked=checked,
shortcut_key=shortcut_key,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also to help mocking in testings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this just improves the typing of tests by letting the editor know that qn.Choice is a generic type whose value needs to match it's constructor argument.

I'll add a doc here to explain this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more explanation in 8e92d4c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it's a bit unfortunate to add a layer only for typing but if it helps. Do you think in the future we should put all typing helpers in a separate file?

milatools/cli/remote.py Outdated Show resolved Hide resolved
Comment on lines +201 to +212
@overload
def run(
self,
cmd: str,
display: bool | None = None,
hide: bool = False,
warn: bool = False,
asynchronous: bool = False,
out_stream: TextIO | None = None,
**kwargs,
) -> invoke.runners.Result | invoke.runners.Promise:
...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind having these @overload methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to help narrow down the return types of the run method. When you call it with asynchronous=True, you get a Promise, and when you call it with asynchronous=False, you get a Result. When asynchronous is a bool with unknown value, the result may be either a Promise or Result.

Comment on lines +641 to +658
@dont_run_for_real
@pytest.mark.parametrize("persist", [True, False])
def test_persist(self, mock_connection: Connection, persist: bool):
alloc = ["--time=00:01:00"]
transforms = [some_transform]
remote = SlurmRemote(
mock_connection, alloc=alloc, transforms=transforms, persist=persist
)
persisted = remote.persist()

# NOTE: Feels dumb to do this. Not sure what I should be doing otherwise.
assert persisted.connection == remote.connection
assert persisted.alloc == remote.alloc
assert persisted.transforms == [
some_transform,
persisted.srun_transform_persist,
]
assert persisted._persist is True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I think we could mock and use a FileRegressionFixture or something like that. And maybe have a test with a real connection to a slurm cluster

Comment on lines +598 to +611
@dont_run_for_real
@pytest.mark.skip(reason="Seems a bit hard to test for what it's worth..")
def test_srun_transform_persist(self, mock_connection: Connection):
alloc = ["--time=00:01:00"]
transforms = [some_transform]
persist: bool = False
remote = SlurmRemote(
mock_connection, alloc=alloc, transforms=transforms, persist=persist
)
output_file = "<some_file>"
assert (
remote.srun_transform_persist("bob")
== f"bob; touch {output_file}; tail -n +1 -f {output_file}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I think we could mock and use a FileRegressionFixture or something like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I see what you mean, you want to avoid the hard-coded portion of the test and move that to a regression file instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I had in mind

Comment on lines +660 to +690
@dont_run_for_real
@disable_internet_access
def test_ensure_allocation_persist(self, mock_connection: Connection):
alloc = ["--time=00:01:00"]
transforms = [some_transform]
remote = SlurmRemote(
mock_connection, alloc=alloc, transforms=transforms, persist=True
)

# TODO: Not sure if this test has any use at this point..
remote.extract = Mock(
spec=remote.extract,
spec_set=True,
return_value=(
Mock(spec=invoke.runners.Runner, spec_set=True),
{"node_name": "bob", "jobid": "1234"},
),
)

results, runner = remote.ensure_allocation()

remote.extract.assert_called_once_with(
"echo @@@ $(hostname) @@@ && sleep 1000d",
patterns={
"node_name": "@@@ ([^ ]+) @@@",
"jobid": "Submitted batch job ([0-9]+)",
},
hide=True,
)
assert results == {"node_name": "bob", "jobid": "1234"}
# raise NotImplementedError("TODO: Imporant and potentially complicated test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can only test this with a real connection to a slurm cluster

Copy link
Collaborator Author

@lebrice lebrice Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this kind of tests doesn't feel smart.

Comment on lines +692 to +725
@dont_run_for_real
@disable_internet_access
def test_ensure_allocation_without_persist(self, mock_connection: Connection):
alloc = ["--time=00:01:00"]
transforms = [some_transform]
remote = SlurmRemote(
mock_connection, alloc=alloc, transforms=transforms, persist=False
)

def write_stuff(
command: str,
asynchronous: bool,
hide: bool,
warn: bool,
pty: bool,
out_stream: QueueIO,
):
assert command == f"bash -c 'salloc {shjoin(alloc)}'"
out_stream.write("salloc: Nodes bob-123 are ready for job")
return unittest.mock.DEFAULT

mock_connection.run.side_effect = write_stuff
results, runner = remote.ensure_allocation()

mock_connection.run.assert_called_once_with(
f"bash -c 'salloc {shjoin(alloc)}'",
hide=False,
asynchronous=True,
out_stream=unittest.mock.ANY,
pty=True,
warn=False,
)
assert results == {"node_name": "bob-123"}
# raise NotImplementedError("TODO: Imporant and potentially complicated test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I think we could mock and use a FileRegressionFixture or something like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how that would work, perhaps we can chat about this in person

Comment on lines 44 to 28
def run(
self,
cmd: Sequence[str],
_run_fn: Callable[Args[P], CompletedProcess[str]] = subprocess.run,
*args: P.args,
**kwargs: P.kwargs,
) -> CompletedProcess[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*args should come before _run_fn otherwise run([cmd ...], "arg1", "arg2", {"kwarg3":"kwarg3"}) will result in

cmd = [cmd...]
_run_fn = "arg1"
*args = ("arg2",)
**kwarg3 = {"kwarg3":"kwarg3"}
Suggested change
def run(
self,
cmd: Sequence[str],
_run_fn: Callable[Args[P], CompletedProcess[str]] = subprocess.run,
*args: P.args,
**kwargs: P.kwargs,
) -> CompletedProcess[str]:
def run(
self,
cmd: Sequence[str],
*args: P.args,
_run_fn: Callable[Args[P], CompletedProcess[str]] = subprocess.run,
**kwargs: P.kwargs,
) -> CompletedProcess[str]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just remove *args altogether if the cmd is already a list. All other arguments should be passed by keyword.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the ordering of arguments, but the explanation for the *args and **kwargs and paramspec is the same as in https://github.com/mila-iqia/milatools/pull/61/files#r1362819307

Copy link
Collaborator Author

@lebrice lebrice Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes:

  • The wrapped function needs to appear before the *args and **kwargs for us to be able to mimic it's signature with ParamSpec for our *args or **kwargs.
  • You'd get a type-checking error if you tried to do run([cmd ...], "arg1", "arg2", {"kwarg3":"kwarg3"}) . You're right though, it wouldn't work if one tried to use this function when passing positional arguments that make sense for subprocess.run.

Assuming that we agree that we want these kinds of methods to be properly typed, we have two options as far as I know:

  • Keeping this ParamSpec as-is
  • List the arguments of the wrapped function explicitly

The second option would duplicate some of the code and the types of the signature of the wrapped function, but might be more explicit.

Let me know what you think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I opted for the second option in this case, fixed this in 1a6ecd5

.pre-commit-config.yaml Outdated Show resolved Hide resolved
Comment on lines 22 to 24
_check_output_fn: Callable[
Concatenate[str | Sequence[str], P], str
] = subprocess.check_output,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mock subprocess.check_output directly.

Comment on lines 44 to 28
def run(
self,
cmd: Sequence[str],
_run_fn: Callable[Args[P], CompletedProcess[str]] = subprocess.run,
*args: P.args,
**kwargs: P.kwargs,
) -> CompletedProcess[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just remove *args altogether if the cmd is already a list. All other arguments should be passed by keyword.

@lebrice
Copy link
Collaborator Author

lebrice commented Nov 1, 2023

Note: I'm feeling less and less comfortable with the master branch being deeply modified (with the new _* default funcs). It doesn't change anything for the end user and prevents us from quickly bringing a fix to users or simply fix a dependency version. But we're too far in the changes now

Hey @satyaog, @breuleux thanks for the reviews, sorry I didn't reply earlier.

I just want to emphasize something: This PR doesn't change anything about the behaviour of the code. The type hints that are added describe the types and signatures of the methods as they are currently.

  • There's a very slight exception with these these _* default functions, which are there for typing purposes and don't have any effect on the runtime. I'm okay with removing them and making the signatures explicit, but this would represent a much bigger change in the implementation of these methods than to just add type annotations to the *args and **kwargs and adding these unused keyword parameters for the wrapped functions.

As for the unit tests, I guess having meh integration tests would be much better than having no tests at all. I think making changes to the master branch to introduce tests actually really aligns with your point: We don't want breaking changes to join master. The problem is, until we add some tests, we have no idea what's breaking or not. We need to take a first jump at some point.

Copy link
Member

@satyaog satyaog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like most of it although I'm still perplex with some of the extra explicit arguments listed to only mirror the typing of underlying functions, typing helper classes and overloads. When there's no friction I really like typing but when it looks like a tedious detour I'm less a fan.

I'm approving but the only thing I'd like to discuss before merge is test_remote.py and see if we could simplify the thing by only looking at the command that are going to be executed for most test, which can be done offline.

Comment on lines 22 to 24
_check_output_fn: Callable[
Concatenate[str | Sequence[str], P], str
] = subprocess.check_output,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no other way? Type checking requires adding a function's argument? Would something like this be possible?

return subprocess.check_output: Callable[
    Concatenate[str | Sequence[str], P], str
](cmd, *args, **kwargs)

Comment on lines +181 to +186
entry: str = qn.autocomplete(
"",
choices=list(modchoices.keys()),
style=qn.Style([("answer", "fg:default bg:default")]),
).unsafe_ask()
entry = entry.strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe?

Suggested change
entry: str = qn.autocomplete(
"",
choices=list(modchoices.keys()),
style=qn.Style([("answer", "fg:default bg:default")]),
).unsafe_ask()
entry = entry.strip()
entry: str = qn.autocomplete(
"",
choices=list(modchoices.keys()),
style=qn.Style([("answer", "fg:default bg:default")]),
).unsafe_ask().strip()

Comment on lines 236 to +239
if env == "<OTHER>":
env = askpath("Enter the path to the environment to use.", remote)
return askpath("Enter the path to the environment to use.", remote)

elif env == "<CREATE>":
pyver = qn.select(
if env == "<CREATE>":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find cleaner a single return point per function but that is only personally

Comment on lines +329 to +348
class Choice(qn.Choice, Generic[_T]):
value: _T

def __init__(
self,
title: FormattedText,
value: _T | None = None,
disabled: str | None = None,
checked: bool | None = False,
shortcut_key: str | bool | None = True,
) -> None:
super().__init__(
title=title,
value=value,
disabled=disabled,
checked=checked,
shortcut_key=shortcut_key,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it's a bit unfortunate to add a layer only for typing but if it helps. Do you think in the future we should put all typing helpers in a separate file?

Comment on lines +598 to +611
@dont_run_for_real
@pytest.mark.skip(reason="Seems a bit hard to test for what it's worth..")
def test_srun_transform_persist(self, mock_connection: Connection):
alloc = ["--time=00:01:00"]
transforms = [some_transform]
persist: bool = False
remote = SlurmRemote(
mock_connection, alloc=alloc, transforms=transforms, persist=persist
)
output_file = "<some_file>"
assert (
remote.srun_transform_persist("bob")
== f"bob; touch {output_file}; tail -n +1 -f {output_file}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I had in mind

Comment on lines +613 to +639
@dont_run_for_real
@pytest.mark.parametrize("persist", [True, False, None])
def test_with_transforms(self, mock_connection: Connection, persist: bool | None):
alloc = ["--time=00:01:00"]
transforms = [some_transform]
original_persist: bool = False
remote = SlurmRemote(
mock_connection,
alloc=alloc,
transforms=transforms,
persist=original_persist,
)
new_transforms = [some_other_transform]
transformed = remote.with_transforms(*new_transforms, persist=persist)
# NOTE: Feels dumb to do this. Not sure what I should be doing otherwise.
assert transformed.connection == remote.connection
assert transformed.alloc == remote.alloc
assert transformed.transforms == [
some_transform,
some_other_transform,
(
transformed.srun_transform_persist
if persist
else transformed.srun_transform
),
]
assert transformed._persist == (remote._persist if persist is None else persist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that. I don't think we actually need to run the command, moreover if we don't check the output. We could only save the command string in a FileRegressionFixture. Then we could potentially wrap a couple of test in a single one with parameters

lebrice and others added 19 commits November 13, 2023 11:51
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
lebrice and others added 18 commits November 13, 2023 11:53
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
This was referenced Nov 13, 2023
@lebrice
Copy link
Collaborator Author

lebrice commented Nov 13, 2023

Closing in favour of #76 and #75 (as well as a lot of overlapping changes with the now-merged #74 )

@lebrice lebrice closed this Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants