Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Test lowest versions of all required and optional dependencies #3639

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Nov 21, 2024

Description of proposed changes

Test lowest versions of all direct dependencies of PyGMT listed in the pyproject.toml file (both required and optional dependencies).

This implements the idea at #3615 (comment) on .github/workflows/ci_tests_legacy.yaml (instead of ci_tests.yaml) by using the uv package manager's --resolution lowest-direct strategy (https://docs.astral.sh/uv/concepts/resolution/#resolution-strategy).

Related to optional dependency policy at #3456, and complements #3615 by testing lowest versions of transitive dependencies more thoroughly. The following version pins have been set:

Package Pin Reason
netCDF4 >=1.7 #3639 (comment)
packaging >=22.0 See commit msg in d1fa38d
contextily >=1.2 added xyzservices as requirement, #3639 (comment)
geopandas >=0.14 trying to move to geopandas 1.x soon, see #3456 (comment)
IPython >=8 nice round number, also see SPEC 0
pyarrow >=13 better datetime64 unit preservation #3639 (comment)
rioxarray >=0.14 v0.14.0 released Mar 2023 is pinned to rasterio>=1.2 and Python>=3.9

Testing locally

If it helps, this is how I'm testing locally using nektos/act:

  • Comment out this line in ci_tests_legacy.yaml first:
     # if: github.repository == 'GenericMappingTools/pygmt'
    
  • Change os: [ubuntu-20.04, macos-13, windows-2019] to just os: [ubuntu-20.04] (or whatever OS you want to test on)
  • Run act schedule --workflows '.github/workflows/ci_tests_legacy.yaml'

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

Using uv's --resolution lowest-direct to install the lowest compatible versions for all direct Python dependencies (required and optional) listed in the pyproject.toml file.
@weiji14 weiji14 added the maintenance Boring but important stuff for the core devs label Nov 21, 2024
@weiji14 weiji14 self-assigned this Nov 21, 2024
Need to get more recent version with a binary wheel, otherwise will need to compile from source.
Pin major versions for optional dependencies that are not on version 0.x. Using a reasonable major version released in the past year or two.
Pin minor versions of optional dependencies that are still on v0.x.
Comment on lines 93 to 95
run: uv run --with pytest==8 --resolution lowest-direct --all-extras --dev make test_no_images PYTEST_EXTRA="-r P"
env:
GMT_LIBRARY_PATH: $CONDA_PREFIX/lib
Copy link
Member Author

@weiji14 weiji14 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this error at https://github.com/GenericMappingTools/pygmt/actions/runs/11945150103/job/33297317606?pr=3639#step:7:35:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/runner/work/pygmt/pygmt/pygmt/__init__.py", line 24, in <module>
    from pygmt import datasets
  File "/home/runner/work/pygmt/pygmt/pygmt/datasets/__init__.py", line 7, in <module>
    from pygmt.datasets.earth_age import load_earth_age
  File "/home/runner/work/pygmt/pygmt/pygmt/datasets/earth_age.py", line 12, in <module>
    from pygmt.datasets.load_remote_dataset import _load_remote_dataset
  File "/home/runner/work/pygmt/pygmt/pygmt/datasets/load_remote_dataset.py", line 9, in <module>
    from pygmt.clib import Session
  File "/home/runner/work/pygmt/pygmt/pygmt/clib/__init__.py", line 9, in <module>
    from pygmt.clib.session import Session, __gmt_version__
  File "/home/runner/work/pygmt/pygmt/pygmt/clib/session.py", line 111, in <module>
    _libgmt = load_libgmt()
  File "/home/runner/work/pygmt/pygmt/pygmt/clib/loading.py", line 62, in load_libgmt
    raise GMTCLibNotFoundError("\n".join(error_msg))
pygmt.exceptions.GMTCLibNotFoundError: Error loading GMT shared library at '/home/runner/micromamba/envs/pygmt/lib/libgmt.so'.
/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/runner/micromamba/envs/pygmt/lib/./libgdal.so.34)
Error loading GMT shared library at 'libgmt.so'.
/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/runner/micromamba/envs/pygmt/lib/python3.10/lib-dynload/../.././libgdal.so.34)
make: *** [Makefile:35: _runtest] Error 1

Similar to issues mentioned at GenericMappingTools/try-gmt#50 (comment) and GenericMappingTools/try-gmt#37 (comment). Seems to only be on the Ubuntu/Linux build.

@@ -33,18 +33,18 @@ dependencies = [
"numpy>=1.24",
"pandas>=2.0",
"xarray>=2023.04",
"netCDF4",
"netCDF4>=1.7",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed making NetCDF an optional dependency before in #429. Maybe we could revisit that discussion?

Copy link
Member

@seisman seisman Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. NetCDF4 was needed by xarray when reading and writing netCDF files. As mentioned in #239 (comment), previously, our load_xxx functions (e.g., load_earth_relief) called xarray.load_dataarray to load local netCDF grids into an xarray.DataArray object, making netCDF a highly "required" dependency.

After we refactor our load_xxx function in #3120, we no longer call xarray.load_dataarray, so netCDF should be no longer highly needed by PyGMT.

I've opened PR #3643 to see how the PyGMT relies on the netCDF package.

Edit: There are only 14 failures in the Python 3.11 + Ubuntu CI job after removing netCDF entirely (see https://github.com/GenericMappingTools/pygmt/actions/runs/11968416461/job/33367210857?pr=3643) and most of them are caused by load_static_earth_relief.

# Install the package that we want to test
- name: Install the package
run: make install
run: |
uv run --with pip==23 --resolution lowest-direct --all-extras --dev make install
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using --resolution lowest-direct for now, because --resolution lowest grabs many other transitive dependencies that may be hard to install (e.g. missing wheels for newer Python versions, so requires compilation from source). We could switch to --resolution lowest once the Python ecosystem does lower bound pins a bit more thoroughly (which may take years).

pyproject.toml Outdated
Comment on lines 43 to 47
"contextily>=1",
"geopandas>=0.14",
"IPython>=8", # 'ipython' is not the correct module name.
"pyarrow>=14",
"rioxarray>=0.14",
Copy link
Member Author

@weiji14 weiji14 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting lower bounds here, otherwise uv will try to install super old versions that may not work anymore, some which actually go back to Python 2! For e.g., it tried to install rioxarray=0.0.1 at https://github.com/GenericMappingTools/pygmt/actions/runs/11945100244/job/33297179010#step:7:45. Happy to reconsider some of these version pins if needed.

Edit see comments at #3639 (comment) and #3639 (comment) below.

@seisman
Copy link
Member

seisman commented Nov 21, 2024

I'm unsure if "uv" is a good solution here, since it installs packages from PyPI only, but GMT is only available on conda-forge. It means we will mix packages from conda-forge and PyPI, which is usually not recommended.

Fix `TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'` when running `pip install --no-deps -e .`, xref pypa/setuptools#4501
@seisman
Copy link
Member

seisman commented Nov 21, 2024

BTW, the "Install package" step has errors but the step still passes.

@weiji14
Copy link
Member Author

weiji14 commented Nov 21, 2024

I'm unsure if "uv" is a good solution here, since it installs packages from PyPI only, but GMT is only available on conda-forge. It means we will mix packages from conda-forge and PyPI, which is usually not recommended.

We have a similar situation in ci_tests_dev.yaml where we install some packages from conda, compile GMT manually, and install Python dependencies from PyPI. But yeah, definitely not recommended (which is why I'm running this on ci_tests_legacy.yaml instead of ci_tests.yaml). This PR is more of a proof of concept right now, and will at least allow us to check what lower bound pins might be needed.

BTW, the "Install package" step has errors but the step still passes.

Yes, I noticed that. Struggling to see if there's an upstream issue on this at https://github.com/astral-sh/uv/issues. Need to find a way to make it return a non-zero exit code.

Create a proper virtualenv, and activate that virtualenv before installing pygmt and running pytest.
So that xyzservices is also installed by default and `from xyzservices import TileProvider` can work. See geopandas/contextily#183
So that `pyarrow.array(..., dtype="string_view")` will work. See apache/arrow#39652 when this was added.
Copy link
Member Author

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated a few more minimum pins on optional dependencies contextily>1.2 and pyarrow>16 based on the failing tests.

"IPython", # 'ipython' is not the correct module name.
"pyarrow",
"rioxarray",
"contextily>=1.2",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to pin to contextily>=1.2, otherwise this code block won't work as expected (there is a chance that an ImportError will be raised if contextily=1.1 is installed, but xyzservices is not installed, breaking the _HAS_CONTEXTILY logic):

try:
import contextily
from xyzservices import TileProvider
_HAS_CONTEXTILY = True
except ImportError:
TileProvider = None
_HAS_CONTEXTILY = False

Xref geopandas/contextily#183, where xyzservices was included as a requirement of contextily (in v1.2.0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that we require contexily>=1.2

pyproject.toml Outdated
"contextily>=1.2",
"geopandas>=0.14",
"IPython>=8", # 'ipython' is not the correct module name.
"pyarrow>=16",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning to pyarrow>16 here to fix this error:

________ test_to_numpy_pyarrow_array_pyarrow_dtypes_string[string_view] ________
>   ???
E   KeyError: 'string_view'
pyarrow/types.pxi:5025: KeyError
During handling of the above exception, another exception occurred:
dtype = 'string_view'
    @pytest.mark.skipif(not _HAS_PYARROW, reason="pyarrow is not installed")
    @pytest.mark.parametrize(
        "dtype",
        [
            None,
            "string",
            "utf8",  # alias for string
            "large_string",
            "large_utf8",  # alias for large_string
            "string_view",
        ],
    )
    def test_to_numpy_pyarrow_array_pyarrow_dtypes_string(dtype):
        """
        Test the _to_numpy function with PyArrow arrays of PyArrow string types.
        """
>       array = pa.array(["abc", "defg", "12345"], type=dtype)
../pygmt/tests/test_clib_to_numpy.py:333: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/array.pxi:230: in pyarrow.lib.array
    ???
pyarrow/types.pxi:5040: in pyarrow.lib.ensure_type
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   ValueError: No type alias for string_view
pyarrow/types.pxi:5027: ValueError

Code was added in #3608:

"string_view",
],
)
def test_to_numpy_pyarrow_array_pyarrow_dtypes_string(dtype):
"""
Test the _to_numpy function with PyArrow arrays of PyArrow string types.
"""
array = pa.array(["abc", "defg", "12345"], type=dtype)
result = _to_numpy(array)
_check_result(result, np.str_)
npt.assert_array_equal(result, array)

The pyarrow.StringViewArray class was added in apache/arrow#39652 that was released in pyarrow v16.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArrow v16.0 was released in April, 2024. Instead of pinning pyarrow>=16.0, we can just skip string_view if pyarrow<v16.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just tested older versions from pyarrow v12 to v15, and think we can pin to pyarrow>=13 (v13 was released on 24 Aug 2023) if ignoring string_view. With v12, there is another error on datetime.

_________________________________________________________________ test_to_numpy_pyarrow_array_pyarrow_dtypes_date[date64[ms]] __________________________________________________________________

dtype = 'date64[ms]', expected_dtype = 'datetime64[ms]'

    @pytest.mark.skipif(not _HAS_PYARROW, reason="pyarrow is not installed")
    @pytest.mark.parametrize(
        ("dtype", "expected_dtype"),
        [
            pytest.param("date32[day]", "datetime64[D]", id="date32[day]"),
            pytest.param("date64[ms]", "datetime64[ms]", id="date64[ms]"),
        ],
    )
    def test_to_numpy_pyarrow_array_pyarrow_dtypes_date(dtype, expected_dtype):
        """
        Test the _to_numpy function with PyArrow arrays of PyArrow date types.

        date32[day] and date64[ms] are stored as 32-bit and 64-bit integers, respectively,
        representing the number of days and milliseconds since the UNIX epoch (1970-01-01).

        Here we explicitly check the dtype and date unit of the result.
        """
        data = [
            date(2024, 1, 1),
            datetime(2024, 1, 2),
            datetime(2024, 1, 3),
        ]
        array = pa.array(data, type=dtype)
        result = _to_numpy(array)
        _check_result(result, np.datetime64)
>       assert result.dtype == expected_dtype  # Explicitly check the date unit.
E       AssertionError: assert dtype('<M8[D]') == 'datetime64[ms]'
E        +  where dtype('<M8[D]') = array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[D]').dtype

../pygmt/tests/test_clib_to_numpy.py:364: AssertionError

The main change appears to be in apache/arrow#33321, when pyarrow supported preserving the datetime64 temporal resolution.

Maybe we can keep the string_view test, but add a skip_if(pyarrow.__version__ < 16) marker or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted pin to pyarrow>=13 in commit 4733bda

@weiji14 weiji14 mentioned this pull request Nov 22, 2024
7 tasks
@seisman seisman mentioned this pull request Nov 22, 2024
7 tasks
Lower pyarrow pin from 16 to 13, and added a skipif pytest marker to the `test_to_numpy_pyarrow_array_pyarrow_dtypes_string[string_view]` unit test to not run when pyarrow<16 is installed. Setting a pin on pyarrow>=13 so that datetime64 unit preservation is handled, xref apache/arrow#33321.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants