Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"[ERROR] the JSON object must be str, bytes or bytearray, not PosixPath" when using a path to pipeline.json file in meta-conduct #410

Open
fraimondo opened this issue May 22, 2024 · 0 comments

Comments

@fraimondo
Copy link

I'm trying to figure out how to use datalad catalog + metalad + extractors.

I was trying the example from the website, but I had an error when running this:

datalad -l 9 meta-conduct ./pipelines/extract_dataset_pipeline.json --pipeline-help 

The output is as follows:

[DEBUG  ] Command line args 1st pass for DataLad 0.19.3. Parsed: Namespace() Unparsed: ['meta-conduct', './pipelines/extract_dataset_pipeline.json', '--pipeline-help'] 
[DEBUG  ] Processing entrypoints 
[DEBUG  ] Loading entrypoint catalog from datalad.extensions 
[DEBUG  ] Loaded entrypoint catalog from datalad.extensions 
[DEBUG  ] Loading entrypoint deprecated from datalad.extensions 
[DEBUG  ] Loaded entrypoint deprecated from datalad.extensions 
[DEBUG  ] Loading entrypoint metalad from datalad.extensions 
[DEBUG  ] Loaded entrypoint metalad from datalad.extensions 
[DEBUG  ] Loading entrypoint neuroimaging from datalad.extensions 
[DEBUG  ] Loaded entrypoint neuroimaging from datalad.extensions 
[DEBUG  ] Loading entrypoint next from datalad.extensions 
[DEBUG  ] Enable posting DataLad config overrides CLI/ENV as GIT_CONFIG items in process ENV 
[DEBUG  ] Apply datalad-next patch to annexrepo.py:AnnexRepo.enable_remote 
[DEBUG  ] Building doc for <class 'datalad.local.configuration.Configuration'> 
[DEBUG  ] Building doc for <class 'datalad_next.patches.configuration.Configuration'> 
[DEBUG  ] Building doc for <class 'datalad.local.configuration.Configuration'> 
[DEBUG  ] Apply datalad-next patch to create_sibling_ghlike.py:_GitHubLike._set_request_headers 
[DEBUG  ] Apply datalad-next patch to interface.(utils|base).py:_execute_command_ 
[DEBUG  ] Building doc for <class 'datalad.core.local.status.Status'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.diff.Diff'> 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.push.Push'> 
[DEBUG  ] Apply patch to datalad.core.distributed.push._transfer_data 
[DEBUG  ] Patching datalad.core.distributed.push.Push docstring and parameters 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.push.Push'> 
[DEBUG  ] Patching datalad.support.AnnexRepo.get_export_records (new method) 
[DEBUG  ] Apply patch to datalad.core.distributed.push._push 
[DEBUG  ] Apply patch to datalad.distribution.siblings._enable_remote 
[DEBUG  ] Building doc for <class 'datalad.distribution.update.Update'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.siblings.Siblings'> 
[DEBUG  ] Retrofit `SpecialRemote` with a `close()` handler 
[DEBUG  ] Replace special remote _main() with datalad-next's progress logging enabled variant 
[DEBUG  ] Building doc for <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG  ] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'> 
[DEBUG  ] Apply patch to datalad.distributed.create_sibling_gitlab._proc_dataset 
[DEBUG  ] Stop advertising discontinued "hierarchy" layout for `create_siblign_gitlab()` 
[DEBUG  ] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.save.Save'> 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.clone.Clone'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.get.Get'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.install.Install'> 
[DEBUG  ] Building doc for <class 'datalad.local.unlock.Unlock'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.run.Run'> 
[DEBUG  ] Apply patch to datalad.core.local.run.format_command 
[DEBUG  ] Apply patch to datalad.distribution.update._choose_update_target 
[DEBUG  ] Apply patch to datalad.support.sshconnector._exec_ssh 
[DEBUG  ] Apply patch to datalad.support.sshconnector.get 
[DEBUG  ] Apply patch to datalad.support.sshconnector.put 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.url2transport_path 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.url2transport_path 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.SSHRemoteIO 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.close 
[DEBUG  ] Apply patch to datalad.customremotes.ria_utils.create_store 
[DEBUG  ] Apply patch to datalad.customremotes.ria_utils.create_ds_in_store 
[DEBUG  ] Apply patch to datalad.customremotes.ria_utils._ensure_version 
[DEBUG  ] Apply patch to datalad.distributed.ora_remote.ORARemote 
[DEBUG  ] Building doc for <class 'datalad_next.patches.replace_create_sibling_ria.CreateSiblingRia'> 
[DEBUG  ] Apply patch to datalad.distributed.create_sibling_ria.CreateSiblingRia 
[DEBUG  ] Building doc for <class 'datalad.distributed.create_sibling_ria.CreateSiblingRia'> 
[DEBUG  ] Loaded entrypoint next from datalad.extensions 
[DEBUG  ] Done processing entrypoints 
[DEBUG  ] Building doc for <class 'datalad_metalad.conduct.Conduct'> 
[DEBUG  ] Parsing known args among ['/Users/fraimondo/miniforge3/envs/junifer/bin/datalad', '-l', '9', 'meta-conduct', './pipelines/extract_dataset_pipeline.json', '--pipeline-help'] 
[DEBUG  ] Determined class of decorated function: <class 'datalad_metalad.conduct.Conduct'> 
[DEBUG  ] Command parameter validation skipped. <class 'datalad_metalad.con

I followed it to:

def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
if isinstance(path_or_object, str):
if path_or_object == "-":
metadata_file = sys.stdin
else:
try:
json_object = json.loads(
files("datalad_metalad.pipeline").joinpath(
f"pipelines/{path_or_object}_pipeline.json"))
return json_object
except FileNotFoundError:
metadata_file = open(path_or_object, "tr")
return json.load(metadata_file)
return path_or_object

What happens is that it is giving a PosixPath object to json.loads. I'm using Python 3.11, which seems to be supported.

Also, unless there some backwards compatibility issue, I was taught (and I stand with this premise) to never use exceptions as flow control. Use if statements.

This is how I fixed it locally:

def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
    if isinstance(path_or_object, str):
        if path_or_object == "-":
            metadata_file = sys.stdin
        else:
            to_read = files("datalad_metalad.pipeline").joinpath(
                f"pipelines/{path_or_object}_pipeline.json"
            )
            if to_read.exists():
                json_object = json.loads(to_read)
                return json_object
            else:
                metadata_file = open(path_or_object, "tr")
        return json.load(metadata_file)
    return path_or_object

While this works for my case, this is just a workaround and still not suitable for a PR. This will still fail due to json.loads expecting a string and not a PosixPath object. So then I tried to run this to test:

datalad -l 9 meta-conduct extract_metadata --pipeline-help

And I've got, as expected:

[ERROR  ] Expecting value: line 1 column 1 (char 0) 

So my final fix (which I think will help you):

def read_json_object(path_or_object: Union[str, JSONType]) -> JSONType:
    if isinstance(path_or_object, str):
        if path_or_object == "-":
            metadata_file = sys.stdin
        else:
            to_read = files("datalad_metalad.pipeline").joinpath(
                f"pipelines/{path_or_object}_pipeline.json"
            )
            if to_read.exists():
                metadata_file = open(to_read, "tr")
            else:
                metadata_file = open(path_or_object, "tr")
        return json.load(metadata_file)
    return path_or_object

Though I just patched so it works in these two cases and I have no clue what is the overall idea behind the read_json_object function and its use-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant