Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to load Blue Marble dataset #2235

Merged
merged 36 commits into from
Sep 30, 2024
Merged

Conversation

willschlitzer
Copy link
Contributor

@willschlitzer willschlitzer commented Dec 7, 2022

This PR (hopefully) adds a function to load the Blue Marble image dataset.

Addresses #1442

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

@willschlitzer willschlitzer added the feature Brand new feature label Dec 7, 2022
@willschlitzer willschlitzer added this to the 0.8.0 milestone Dec 7, 2022
@willschlitzer willschlitzer self-assigned this Dec 7, 2022
@seisman seisman modified the milestones: 0.8.0, 0.9.0 Dec 11, 2022
@seisman seisman modified the milestones: 0.9.0, 0.10.0 Mar 16, 2023
@seisman seisman modified the milestones: 0.10.0, 0.11.0 Jun 28, 2023
@seisman seisman modified the milestones: 0.11.0, 0.12.0 Dec 11, 2023
@weiji14
Copy link
Member

weiji14 commented Feb 17, 2024

Gonna push some commits here to get the load_blue_marble function in for PyGMT v0.12.0.

Need modify arguments into the Resolution class.
A short doctest to test loading the Blue Marble dataset, and rename grid to image since a 3-band RGB image is returned.
pygmt/io.py Outdated
Comment on lines 42 to 52
if kwargs.get("engine") == "rasterio":
import rioxarray

open_dataarray_func = rioxarray.open_rasterio
kwargs.pop("engine")
else:
open_dataarray_func = xr.open_dataarray

with open_dataarray_func(filename_or_obj, **kwargs) as dataarray:
result = dataarray.load()
_ = result.gmt # load GMTDataArray accessor information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a segfault when loading the RGB image, see https://github.com/GenericMappingTools/pygmt/actions/runs/7939085789/job/21678737828?pr=2235#step:8:415:

Fatal Python error: Segmentation fault

Current thread 0x00007f0a18438740 (most recent call first):
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/xarray/backends/file_manager.py", line 234 in close
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/xarray/core/common.py", line 1205 in close
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/xarray/core/common.py", line 1422 in __exit__
  File "/home/runner/work/pygmt/pygmt/pygmt/io.py", line 50 in load_dataarray
  File "/home/runner/work/pygmt/pygmt/pygmt/datasets/load_remote_dataset.py", line 439 in _load_remote_dataset
  File "/home/runner/work/pygmt/pygmt/pygmt/helpers/decorators.py", line 776 in new_module
  File "/home/runner/work/pygmt/pygmt/pygmt/datasets/earth_daynight.py", line 57 in load_blue_marble
  File "/home/runner/work/pygmt/pygmt/pygmt/tests/test_grdimage_image.py", line 17 in fixture_xr_image
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 902 in call_fixture_func
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 1123 in pytest_fixture_setup
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 1069 in execute
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 693 in _compute_fixture_value
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 607 in _get_active_fixturedef
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 585 in getfixturevalue
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/fixtures.py", line 566 in _fillfixtures
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/python.py", line 1795 in setup
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 494 in setup
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 157 in pytest_runtest_setup
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 127 in runtestprotocol
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/main.py", line 325 in _main
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/main.py", line 271 in wrap_session
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/config/__init__.py", line 169 in main
  File "/home/runner/micromamba/envs/pygmt/lib/python3.12/site-packages/_pytest/config/__init__.py", line 192 in console_main
  File "/home/runner/micromamba/envs/pygmt/bin/pytest", line 10 in <module>

Extension modules: markupsafe._speedups, yaml._yaml, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, cftime._cftime, _brotli, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, matplotlib._image, rasterio._version, rasterio._err, rasterio._filepath, rasterio._env, rasterio._transform, rasterio._base, rasterio.crs, rasterio._features, rasterio._warp, rasterio._io, psutil._psutil_linux, psutil._psutil_posix, pyproj._compat, pyproj._datadir, pyproj._network, pyproj._geod, pyproj.list, pyproj._crs, pyproj.database, pyproj._transformer, pyproj._sync, shapely.lib, shapely._geos, shapely._geometry_helpers, netCDF4._netCDF4, PIL._imagingmath, fiona._err, fiona._env, fiona.crs, fiona._geometry, fiona.schema, fiona.ogrext (total: 103)
Segmentation fault (core dumped)
make: *** [Makefile:38: _runtest] Error 139
../pygmt/tests/test_grdimage_image.py::test_grdimage_image_dataarray 

If I comment out the _ = result.gmt line locally, the test_grdimage_image_dataarray check will pass, but there will be a segfault later at test_grdimage_image_dataarray_unsupported_dtype. Not sure why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I comment out the _ = result.gmt line locally, the test_grdimage_image_dataarray check will pass, but there will be a segfault later at test_grdimage_image_dataarray_unsupported_dtype. Not sure why.

Hmm, so I removed the GMTAccessor loading in fc060ea when the Blue Marble GeoTIFF is loaded using rioxarray.open_rasterio, and it seems like the segfault disappears on CI, though I still get a segfault if I run just test_grdimage_image_dataarray locally...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: using GMT_OUT|GMT_IS_REFERENCE for images seems to help to avoid the segfault. See commit 6eabbf4, c01d6ab (#3338) and 003383d (#3115) where this has been applied.

Prevent segfault when loading accessor info from an xarray.Datarray loaded using rioxarray.open_rasterio.
@weiji14 weiji14 self-assigned this Feb 18, 2024
Also set units=None, and renamed 'grid' to 'image' in the docstrings.
Also added some failing unit tests, since grdcut doesn't allow cropping GeoTIFFs yet.
@seisman seisman removed this from the 0.12.0 milestone Feb 26, 2024
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready for review! I can approve this since I didn't open the PR, but would appreciate someone else to take a look too.

@@ -39,7 +37,7 @@ def test_grdimage_image_dataarray(xr_image):
Plot a 3-band RGB image using xarray.DataArray input.
"""
fig = Figure()
fig.grdimage(grid=xr_image)
fig.grdimage(grid=xr_image, region="d")
Copy link
Member

@weiji14 weiji14 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set region="d" here, because fig.grdimage(grid="@earth_day_01d") plots using projection="Q15c" by default, whereas fig.grdimage(grid=xr_image) uses projection="X15c/15c".

Baseline (@earth_day_01d) Result (xr_image without region="d")
baseline result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we write the 3-band xarray.DataArray into a temporary GeoTIFF file. Is it because we don't set the CRS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, good point! I've set the projection in commit 471c6e8, and grdimage now plots both of them the same way.

@weiji14 weiji14 added this to the 0.14.0 milestone Sep 29, 2024
@weiji14 weiji14 marked this pull request as ready for review September 29, 2024 21:18
@weiji14 weiji14 added the needs review This PR has higher priority and needs review. label Sep 29, 2024
>>> # load the default image (pixel-registered 1 arc-degree image)
>>> image = load_blue_marble()
"""
image = _load_remote_dataset(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to support earth_night in this function?

Copy link
Member

@weiji14 weiji14 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR for load_black_marble after this one is done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. I meant whether you plan to add the load_black_marble function in this file pygmt/datasets/earth_daynight.py or have load_blue_marble/load_black_marble in two separate files.

Copy link
Member

@weiji14 weiji14 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put them in the same file, and in pygmt/datasets/__init__.py, have a line like:

from pygmt.datasets.earth_daynight import load_blue_marble, load_black_marble

Does that sound ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to having them in separate files. The two files are almost identical, except the differences in the dataset name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, renamed in 5b6bb42.

pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved
@@ -39,7 +37,7 @@ def test_grdimage_image_dataarray(xr_image):
Plot a 3-band RGB image using xarray.DataArray input.
"""
fig = Figure()
fig.grdimage(grid=xr_image)
fig.grdimage(grid=xr_image, region="d")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we write the 3-band xarray.DataArray into a temporary GeoTIFF file. Is it because we don't set the CRS?

@@ -3,7 +3,8 @@
"""

import pytest
from pygmt import Figure, which
from pygmt import Figure
from pygmt.datasets import load_blue_marble

rioxarray = pytest.importorskip("rioxarray")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line be removed?

Suggested change
rioxarray = pytest.importorskip("rioxarray")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, because we are still using tempfile_from_image to convert xarray.DataArray to a GeoTIFF before passing into grdimage. But see #3468.

weiji14 and others added 2 commits September 30, 2024 12:54
Set the projection system so that the xarray.DataArray image is plotted as a Geographic image rather than a Cartesian image by GMT.
Test that the grid returned by default for the 1 arc-minute resolution has
a "pixel" registration.
"""
data = load_blue_marble(resolution="01m", region=[-10, -9, 3, 5])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to cache this earth_day_01m tile in pygmt/helpers/caching.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at my ~/.gmt/server/earth/earth_day/earth_day_01m_p.tif file and it is 179.6MB 😅 Are they not meant to be tiled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. They're not tiled due to technical difficulties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, earth_day files are not tiled at https://oceania.generic-mapping-tools.org/server/earth/earth_day/. And actually, there are only pixel-registered grids so can probably just remove this test.

The earth_day images are always pixel-registered anyway.
The earth_day images are not tiled, but come as a single GeoTIFF file for all resolutions.
Comment on lines +94 to +96
# If rioxarray is installed, set the coordinate reference system
if hasattr(image, "rio"):
image = image.rio.write_crs(input_crs="OGC:CRS84")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GMT_IMAGE.to_dataarray(), perhaps we should parse the header->ProjRefPROJ4 and set the correct CRS to the 3-band xarray.DataArray. If done, then we probably don't need to set the CRS here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of parsing the projection information from the header when you mentioned this ProjRefPROJ4 field at #3128 (comment). But ideally we'll need to handle PROJ4/WKT/EPSG:

# Referencing system string in PROJ.4 format
("ProjRefPROJ4", ctp.c_char_p),
# Referencing system string in WKT format
("ProjRefWKT", ctp.c_char_p),
# Referencing system EPSG code
("ProjRefEPSG", ctp.c_int),

Something to consider for a separate PR, because we might need to use pyproj for this.

pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved
pygmt/tests/test_datasets_earth_day.py Outdated Show resolved Hide resolved
Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except one minor comment.

pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved
@weiji14 weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Sep 30, 2024
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for fun ✔️. Thanks @willschlitzer for starting this almost two years ago (hope you're doing well), and @seisman for reviewing! Let's get this in for PyGMT v0.14.0!

@weiji14 weiji14 merged commit 66f258a into main Sep 30, 2024
21 of 23 checks passed
@weiji14 weiji14 deleted the load-remote-dataset/blue-marble branch September 30, 2024 02:25
@weiji14 weiji14 removed the final review call This PR requires final review and approval from a second reviewer label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants