Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Let python handle newlines in subprocess calls with universal_newlines #4517

Open
echoix opened this issue Oct 14, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@echoix
Copy link
Member

echoix commented Oct 14, 2024

Is your feature request related to a problem? Please describe.
In #4366, there were three failures that remained to be fixed to work properly on Windows. The hardest one concerned the newline handling with results coming from a call to a subprocess, and some discussion is needed to choose what is best.

Describe the solution you'd like

When using a TextIO, or a subprocess call, use the option text=True, that is the new name for universal_newlines, and get closer to the python way of handling a process's output (including the encoding handling)

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.
https://docs.python.org/3/library/subprocess.html#subprocess.run

An example of a change that was passing all these pytest tests (in python/grass/script/core.py) is given below. I would like a cleaner solution, preferably upper in the chain.

def read_command(*args, **kwargs):
    """Passes all arguments to pipe_command, then waits for the process to
    complete, returning its stdout (i.e. similar to shell `backticks`).
    The behavior on error can be changed using *errors* parameter
    which is passed to the :func:`handle_errors()` function.
    :param list args: list of unnamed arguments (see start_command() for details)
    :param list kwargs: list of named arguments (see start_command() for details)
    :return: stdout
    """
    encoding = "default"
    if "encoding" in kwargs:
        encoding = kwargs["encoding"]
+   if "universal_newlines" not in kwargs:
+      kwargs["universal_newlines"] = True

    if _capture_stderr and "stderr" not in kwargs.keys():
        kwargs["stderr"] = PIPE
    process = pipe_command(*args, **kwargs)
    stdout, stderr = process.communicate()
    if encoding is not None:
        stdout = _make_unicode(stdout, encoding)
        stderr = _make_unicode(stderr, encoding)
    returncode = process.poll()
    if returncode and _capture_stderr and stderr:
        # Print only when we are capturing it and there was some output.
        # (User can request ignoring the subprocess stderr and then we get only None.)
        sys.stderr.write(stderr)
    return handle_errors(returncode, stdout, args, kwargs)

and

def _create_location_xy(database, location):
    """Create unprojected location
    Raise ScriptError on error.
    :param database: GRASS database where to create new location
    :param location: location name
    """
    cur_dir = os.getcwd()
    try:
        os.chdir(database)
        os.mkdir(location)
        os.mkdir(os.path.join(location, "PERMANENT"))
        # create DEFAULT_WIND and WIND files
        regioninfo = [
            "proj:       0",
            "zone:       0",
            "north:      1",
            "south:      0",
            "east:       1",
            "west:       0",
            "cols:       1",
            "rows:       1",
            "e-w resol:  1",
            "n-s resol:  1",
            "top:        1",
            "bottom:     0",
            "cols3:      1",
            "rows3:      1",
            "depths:     1",
            "e-w resol3: 1",
            "n-s resol3: 1",
            "t-b resol:  1",
        ]

        defwind = open(os.path.join(location, "PERMANENT", "DEFAULT_WIND"), "w")
-       for param in regioninfo:
-           defwind.write(param + "%s" % os.linesep)
+       defwind.writelines(f"{param}\n" for param in regioninfo)
        defwind.close()

        shutil.copy(
            os.path.join(location, "PERMANENT", "DEFAULT_WIND"),
            os.path.join(location, "PERMANENT", "WIND"),
        )
        os.chdir(cur_dir)
    except OSError as e:
        raise ScriptError(repr(e))

Describe alternatives you've considered

https://stackoverflow.com/questions/38181494/what-is-the-difference-between-using-universal-newlines-true-with-bufsize-1-an

I feel that the universal_newlines should have been deprecated when they renamed to a more descriptive name text, but the impact was too big on the ecosystem. In the current way that we handle calling modules with subprocesses, we seem to split the keyword arguments whether they belong to the module or to the subprocess call. I think they should be separated as a different keyword argument to pass along instead. But without that, I think using the "text" keyword argument would have too many collisions, as it is a big restriction to disallow that name as a valid argument name to any grass module. The name "universal_newlines" is less likely to be used by a grass module.

I also considered if the whole handling of encoding/decoding should really be done on our part vs letting Python do it. We have some failing gunittest tests on Windows failing because of encoding problems with a census map name that has weird characters. Launching a subprocess and getting a result takes a lot of function calls, and possibly some cases would be best handled with run( ... , check=True) instead (possibly not realistic everywhere as I understand that you don't get to see the output while it runs).

There's also a more advantageous alternative to explore: subprocesses APIs from the asyncio module, https://docs.python.org/3/library/asyncio-subprocess.html. It isn't drop in, as using asyncio might need changes at the calling site, but still seems to be what we are expecting when we are using ps.wait() or ps.communicate() in grass.script.core read_command or run_command.

Also see some notable differences in https://docs.python.org/3/library/asyncio-subprocess.html#asyncio.subprocess.Process

Additional context
Add any other context or screenshots about the feature request here.

@echoix echoix added the enhancement New feature or request label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant