Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta-extract: Catch non-existing paths earlier #354

Open
adswa opened this issue Mar 1, 2023 · 0 comments · Fixed by #355
Open

meta-extract: Catch non-existing paths earlier #354

adswa opened this issue Mar 1, 2023 · 0 comments · Fixed by #355

Comments

@adswa
Copy link
Member

adswa commented Mar 1, 2023

When I attempt a metadata extraction of a non-existing file with a relative path, the first thing that returns an error is a get_file_info complaining about a missing "dataset status for dataset":

(handbook) adina@muninn in /tmp/ds1 on git:master!
❱ datalad meta-extract -d . metalad_example_file someo.txt
[ERROR  ] no dataset status for dataset: /tmp/ds1 file: /tmp/ds1/someo.txt 

The logic below seems to be careful to not resolve paths:

# Create a relative Path-instance, if path to a file is given. We have
# to be careful not to resolve the path, because that could resolve the
# git-annex link.
path_object = None
if path is not None:
path_object = Path(path)
if path_object.is_absolute():
relative_path = None
for dataset_path in (source_dataset.pathobj,
source_dataset.pathobj.resolve()):
try:
relative_path = path_object.relative_to(dataset_path)
break
except ValueError:
pass
if relative_path is None:
raise ValueError(
f"The provided path {path} is not contained in the "
f"dataset given by {source_dataset.pathobj}"
)
path_object = relative_path

I think we should check early on if a path points to an existing file, and if not, return a proper error or impossible result early and informatively.

adswa added a commit to adswa/datalad-metalad that referenced this issue Mar 2, 2023
…lidation

This is a demonstration how one existing command could adopt datalad-next's
parameter constraint validation. It changes the baseclass to next's
ValidatedInterface, and defines a validator with relevant parameter constraints:

Specifically, the constraints are:
- The provided datasets exists, or a dataset can be derived from the curdir
- The path points to an existing file (ref datalad#354)
- The extractorname is a string
- The extractorargs is a mapping of key-value pairs

This makes a dedicated check whether a file exists obsolete, and it could replace
the checks that check_dataset() does (provided an additional constraint option
in EnsureDataset() that allows to check for valid dataset IDs - I've created an
issue about this in datalad/datalad-next#272).

This change would introduce a dependency to datalad-next, and as parts of
this PR were only tested with yet unreleased branches of datalad-next,
it will not work right now unless you're on the right development version
of datalad-next.
adswa added a commit to adswa/datalad-metalad that referenced this issue Mar 2, 2023
…lidation

This is a demonstration how one existing command could adopt datalad-next's
parameter constraint validation. It changes the baseclass to next's
ValidatedInterface, and defines a validator with relevant parameter constraints:

Specifically, the constraints are:
- The provided datasets exists, or a dataset can be derived from the curdir
- The path points to an existing file (ref datalad#354)
- The extractorname is a string
- The extractorargs is a mapping of key-value pairs

This makes a dedicated check whether a file exists obsolete, and it could replace
the checks that check_dataset() does (provided an additional constraint option
in EnsureDataset() that allows to check for valid dataset IDs - I've created an
issue about this in datalad/datalad-next#272).

This change would introduce a dependency to datalad-next, and as parts of
this PR were only tested with yet unreleased branches of datalad-next,
it will not work right now unless you're on the right development version
of datalad-next.
adswa added a commit to adswa/datalad-metalad that referenced this issue Mar 2, 2023
…lidation

This is a demonstration how one existing command could adopt datalad-next's
parameter constraint validation. It changes the baseclass to next's
ValidatedInterface, and defines a validator with relevant parameter constraints:

Specifically, the constraints are:
- The provided datasets exists, or a dataset can be derived from the curdir
- The path points to an existing file (ref datalad#354)
- The extractorname is a string
- The extractorargs is a mapping of key-value pairs

This makes a dedicated check whether a file exists obsolete, and it could replace
the checks that check_dataset() does (provided an additional constraint option
in EnsureDataset() that allows to check for valid dataset IDs - I've created an
issue about this in datalad/datalad-next#272).

This change would introduce a dependency to datalad-next, and as parts of
this PR were only tested with yet unreleased branches of datalad-next,
it will not work right now unless you're on the right development version
of datalad-next.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant