Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to remake file if the command writes to stdout (external remote error) #82

Closed
mslw opened this issue Jan 17, 2025 · 4 comments · Fixed by #89
Closed

Unable to remake file if the command writes to stdout (external remote error) #82

mslw opened this issue Jan 17, 2025 · 4 comments · Fixed by #89

Comments

@mslw
Copy link
Contributor

mslw commented Jan 17, 2025

I tried to use datalad-remake with a Python script which uses Nipype workflows imported from fMRIPrep. fMRIPrep tends to log extensively to standard output. This caused the remake special remote to crash ("external special remote protocol error, unexpectedly received (-- content of log message here --)") when getting the file.

It's better to demonstrate this with a simple example - a script with "print debugging".

Dataset setup:

❱ datalad create ds
❱ cd ds
❱ git config --local user.signingkey 9D33F696BA3651D374596DA2E0E6DF9DDE446047
❱ git config --local commit.gpgsign true
❱ mkdir -p .datalad/make/methods

❱ git add .datalad/make/methods/ write.py
❱ git commit -m "Add method and code"

File: .datalad/make/methods/method

parameters = ['output']
command = ["python", "write.py", "{output}"]

File: write.py

from argparse import ArgumentParser
from pathlib import Path

parser = ArgumentParser()
parser.add_argument("file", type=Path)
args = parser.parse_args()

print(f"Beware, for I shall write to {args.file}")
args.file.write_text("Text file contents")

Initial make (success):

❱ datalad make -p output="f1.txt" -o f1.txt method
gpg: keybox '/tmp/tmp4bsw76uf/pubring.kbx' created
gpg: /tmp/tmp4bsw76uf/trustdb.gpg: trustdb created
gpg: key E0E6DF9DDE446047: public key "mslw-test" imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg: Signature made Fri 17 Jan 2025 05:51:08 PM CET
gpg:                using RSA key 9D33F696BA3651D374596DA2E0E6DF9DDE446047
gpg: Good signature from "mslw-test" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 9D33 F696 BA36 51D3 7459  6DA2 E0E6 DF9D DE44 6047
Beware, for I shall write to f1.txt
Deleted branch tmph_i4mj_t (was 48e34c9).
make(ok): /home/mszczepanik/tester/ds/f1.txt [added url: 'datalad-remake:///?label=method&root_version=48e34c96e17ef5b9b600ef4a39a7e1776338941f&specification=e4059dd966bc85c97add320df9ed07c4&this=f1.txt' to PurePosixPath('f1.txt') in /home/mszczepanik/tester/ds]

Attempt to remake (external special remote protocol error):

❱ datalad drop f1.txt
drop(ok): f1.txt (file)
(venv-remake) mszczepanik@juseless in ~/tester/ds on git:main
❱ datalad get f1.txt
get(error): f1.txt (file) [external special remote protocol error, unexpectedly received "Beware, for I shall write to f1.txt" (unable to parse command)
unable to use special remote due to protocol error]

As a side note, if in that script I write to standard error (sys.stderr.write) instead of standard output (print), there is no protocol error and remake succeeds; however on initial make the standard error is not displayed in the terminal.

@christian-monch
Copy link
Contributor

@mslw Thanks for pointing that out. The reason for the bug is that stdout of the datalad-remake special remote process is used for communication with the git-annex process. Any output to stdout from the special remote process or its children that does not fit the protocol (which is almost any output) will trigger this error. Two solutions come to mind:

1.) Instruct the user not to write anything to stdout
2.) Let the special remote redirect stdout to a temporary file and make it available in collect via a special name
3.) Redirect stdout to /dev/null or NUL
4.) ...

To me option 2. makes the most sense. WDYT?

@mslw
Copy link
Contributor Author

mslw commented Jan 22, 2025

Oh, I see, the git-annex external speciale remote protocol docs says "Communication is via stdin and stdout. Therefore, the external special remote must avoid doing any prompting, or outputting anything like eg, progress to stdout. (Such stuff can be sent to stderr instead.)".

Although in this case it is not the standard remote which writes to stdout, it is the user-provided program that the special remote triggers. Not sure if that distinction matters in practice.

Now to your options:

(1) seems extremely unattractive to me. In my mind, fMRIPrep is the ideal use case for remake, and fMRIPrep tends to log to both stdin and stderr, depending on the message (I suppose it's far from being the only program which behaves like that). By extension, code which imports workflows from fMRIPrep inherits that behaviour. So if stdout is off limits, it would be impossible to use fMRIPrep as-is. Perhaps I would be able to re-configure the loggers in my fMRIPrep-based code, or wrap the Python code in a Bash script which redirects the outputs (and use that with remake) but things become more involved this way.

(2) seems ok. One thing we would loose is the insight into the progress indicated by the log messages gradually appearing. I'm also not sure where the file should be written - my instinct is some throwaway location, maybe configurable?

(3) it feels wrong just to throw it away. That being said, in a remake scenario I probably expect the final result to be correct and do not care about the stdout logging... as long as things go well.

To be honest, I am not sure how and at which level the data streams can be redirected. I have seen people do things like 2>&1 >&3 so I believe there is some flexibility - and that's where my experience ends. In terms of (4), I can imagine maybe flushing things to stdout (bypassing the stream git-annex reads, if possible), either gradually (if possible) or at the end. But doing it gradually would probably mess with get's progreess reporting, wouldn't it? And doing it in the end could create a lot of useless mess.

@christian-monch
Copy link
Contributor

christian-monch commented Jan 22, 2025

In terms of 4.) one possibility would be to redirect all output to stderr and prefix it with the source, e.g.

[STDERR]: ...
[STDERR]: ...
[STDOUT]: ...
...

That would make all outputs available via stderr

christian-monch added a commit that referenced this issue Jan 23, 2025
This PR fixes issue #82

By default it suppresses all output on `stdout` that is created when the
commands in a template are executed. The output can be written to a file
by specifying the `-s/--stdout` argument in `datalad make`.
@christian-monch
Copy link
Contributor

christian-monch commented Jan 23, 2025

@mslw PR #89 should fix the error. It will swallow stdout-output by default. The datalad make-parameter -s/--stdout allows to write stdout-output to a file. There is currently no appending to existing files.

Let me know what you think. Please feel free to reopen this issue, if the current stdout-handling is not sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants