Cannot pickle '_thread.lock' object exception after DataArray transpose and copy operations from netCDF file. #8442

sharkinsspatial · 2023-11-11T00:03:38Z

What is your issue?

I hit this issue while using rioxarray with a series of operations similar to those noted in this issue corteva/rioxarray#614. After looking through the rioxarray codebase a bit I was able to reproduce the issue with pure xarray operations.

If the Dataset is opened with the default lock=True settings, transposing a DataArray's coordinates and then copying the DataArray results in a cannot pickle '_thread.lock' object exception.

If the Dataset is opened with lock=False, no error is thrown.

This sample notebook reproduces the error.

This might be user error on my part, but it would be great to have some clarification on why lock=False is necessary here as my understanding was that this should only be necessary when using parallel write operations.

The text was updated successfully, but these errors were encountered:

welcome · 2023-11-11T00:03:40Z

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

sharkinsspatial · 2023-12-06T02:12:39Z

Here is a locally reproducible MCVE.

import xarray as xr
import numpy as np

file_path = "test.nc"

ds = xr.Dataset(
    {
        'latitude': np.arange(10),
        'longitude': np.arange(10),
        'precip': (['latitude', 'longitude'], np.arange(100).reshape(10, 10))
    }
)

ds.to_netcdf(file_path, engine="h5netcdf")

ds = xr.open_dataset(file_path, engine="h5netcdf", decode_coords=True, decode_times=True)
da = ds["precip"]
da = da.transpose("longitude", "latitude", missing_dims="ignore")
da = da.copy()

Note that if xr.open_dataset is called with lock=False the _io.BufferedReader error is not thrown. 👍

max-sixty · 2023-12-06T02:36:23Z

Hmm, I don't get an error there. Can you post your dependencies? (Instructions in the bug report template)

Edit: though it seems to rely on the file being there from #8443...

sharkinsspatial · 2023-12-06T02:56:55Z

Apologies, I had not written the netCDF file out in the MCVE 🤦‍♂️, the example is updated now. I was able to produce the error in the environment below.

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.0 (default, Apr 14 2021, 14:07:04)
[Clang 12.0.0 (clang-1200.0.32.29)]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None

xarray: 2023.11.0
pandas: 2.1.3
numpy: 1.26.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.12.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

dcherian · 2023-12-06T03:28:56Z

Mine too succeeds with libhdf5: 1.14.2, otherwise my versions of xarray, h5netcdf, h5py match yours.

PS: the code I run has to_netcdf

sharkinsspatial · 2023-12-07T16:39:37Z

🤔 I upgraded to libhdf5: 1.14.3 and was still able to reproduce. To try and isolate any potential h5netcdf problems I also attempted the following with the default netcdf4 engine and hit the same exception.

import xarray as xr
import numpy as np

file_path = "test.nc"

ds = xr.Dataset(
    {
        'latitude': np.arange(10),
        'longitude': np.arange(10),
        'precip': (['latitude', 'longitude'], np.arange(100).reshape(10, 10))
    }
)

ds.to_netcdf(file_path)

ds = xr.open_dataset(file_path, decode_coords=True, decode_times=True)
da = ds["precip"]
da = da.transpose("longitude", "latitude", missing_dims="ignore")
da = da.copy()

I'm going to ask a few colleagues to try and replicate to see if this is something peculiar to my environment.

sharkinsspatial · 2023-12-07T17:50:07Z

My colleague was also able to reproduce the exception as well with the ☝️ netcdf4 engine code and the following environment.

INSTALLED VERSIONS ------------------ commit: None python: 3.9.18 (main, Nov 2 2023, 16:51:22) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 23.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0

xarray: 2023.10.1
pandas: 2.1.3
numpy: 1.26.2
scipy: 1.9.1
netCDF4: 1.6.1
pydap: None
h5netcdf: None
h5py: 3.8.0
Nio: None
zarr: 2.12.0
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.10.0
cupy: None
pint: None
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 69.0.2
pip: 23.3.1
conda: None
pytest: 6.2.5
mypy: 0.910
IPython: 7.34.0
sphinx: 6.2.1

I'm unsure what the differences in environments could be 🤔 .

max-sixty · 2023-12-07T21:22:50Z

That's a puzzle... Can we reproduce it in a binder / in a test?

zmoon · 2023-12-13T05:16:26Z

No idea if it has the same underlying cause (I'm not transposing but am copying), but I do have a situation that used to work but now¹ gives this same cannot pickle '_thread.lock' object error². I'll have to see if I can make it into a minimal example. Tried downgrading some things in my environment to no avail.

Edit: here's a little example³ experimenting with joblib.dump to see when the error is raised.

import xarray as xr
from joblib import dump

ds = xr.tutorial.load_dataset("air_temperature").isel(time=slice(4))
ds.to_netcdf("ds.nc", engine="netcdf4")
dump(ds, "ds.joblib")  # 0. Succeeds
ds.close()

# 1. Try to pickle the whole Dataset
ds = xr.open_dataset("ds.nc")
dump(ds, "ds.joblib")  # TypeError: cannot pickle '_thread.lock' object

# 2. Try to pickle a DataArray
ds = xr.open_dataset("ds.nc")
dump(ds.air, "ds.air.joblib")  # TypeError: cannot pickle '_thread.lock' object

# 3. Somehow adding a new variable makes it okay to pickle `ds.air` (and `ds` if `.copy()` applied)
ds = xr.open_dataset("ds.nc")
ds["b"] = xr.zeros_like(ds.air)
dump(ds.air, "ds.air.joblib")  # Succeeds
dump(ds, "ds.joblib")  # But this still fails
dump(ds.copy(), "ds.joblib")  # Succeeds

Versions

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.133.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 1.5.3
numpy: 1.26.2
scipy: 1.11.4
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.3
cartopy: 0.22.0
seaborn: 0.11.0
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.3
mypy: 1.7.1
IPython: 8.18.1
sphinx: 5.3.0

Also tried in an env with HDF5 1.14.3, it didn't help.

First noticed a month or two ago I think. ↩
Based on what happened later on this thread, maybe in my old env where it was working I had Dask available, for its SerializableLock, unlike in this new env where I was getting the error. ↩
Not super related to my real case except that my case involves joblib. ↩

zmoon · 2023-12-13T06:54:42Z

I was able to reproduce the error in OP's above example in a fresh env. Similar to one of my experiments, the error is, for me, averted if you add a new variable to the Dataset (e.g. ds["asdf"] = xr.zeros_like(ds.precip)) before the transpose line.

kmuehlbauer · 2023-12-13T07:04:03Z

@zmoon Thanks for this MCVE! I can't reproduce the error, though. Also the MCVE in #8442 (comment) works nicely (details below).

Does it still fail ~~if environments are created from scratch or~~ on other systems? It looks like linux itself is not affected, only MacOSX and WSL?

Versions

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.19.0-22-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 2.1.3
numpy: 1.26.2
scipy: 1.11.4
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.11.0
distributed: 2023.11.0
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.18.1
sphinx: None
</details>

kmuehlbauer · 2023-12-13T07:18:47Z

OK, here we go, I've taken dask out of the loop in a fresh env and can now reproduce both MCVE.

Versions

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.0 | packaged by conda-forge | (main, Oct  3 2023, 08:43:22) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150500.55.19-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.2
scipy: None
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.18.1
sphinx: None

zmoon · 2023-12-13T07:27:13Z

@kmuehlbauer I experienced the error on Windows as well as WSL.

I tried a fresh env on Linux and still got the error 🤷

Versions

mamba create -n test-lock python=3.11 xarray pooch netcdf4 h5netcdf joblib

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.27.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.2
scipy: None
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Edit: From above OP also didn't have Dask. Adding dask-core to my env, no more error.

kmuehlbauer · 2023-12-13T08:15:11Z

There has been some refactoring lately involving dask and other ChunkManagers. Not sure, if this has anything to do with it, but maybe @TomNicholas has more insight here.

TomNicholas · 2023-12-13T19:30:53Z

I don't really see why this should have anything to do with it... I guess it's not impossible that somehow some dask lock argument is now getting lost, but I suggest that if we can now reproduce the error someone should do a git-bisect to find out which commit caused the regression.

EDIT: But you're saying you can reproduce this without dask anyway @kmuehlbauer ?

kmuehlbauer · 2023-12-13T19:46:13Z

Yes, thanks @TomNicholas for looking into this. Will try to bisect this.

kmuehlbauer · 2023-12-13T20:25:49Z

@zmoon @sharkinsspatial Did this ever work for you? I've a hard time finding a working commit. I've checked several versions back to 0.17.0 without success. Also the other involved dependencies (hdf5, netcdf-c, netcdf4-python, h5py, pandas) would be good to know to recreate a working environment.

zmoon · 2023-12-13T20:30:16Z

@kmuehlbauer for me I don't have the environment anymore, but I suspect I probably had dask installed in it and that's why it was working.

kmuehlbauer · 2023-12-15T07:30:07Z

TL;DR:

The current default of xr.open_dataset (netcdf4/h5netcdf) uses lazy loading which uses threading.Lock as default locking mechanism if dask is not available. The object cannot be pickled and after some computations (here .transpose) also not (deep)-copied. The only way around is to either explicitly use lock=False when opening files or do a .load() or .compute() before pickle/copy.

Inspection:

Using the MCVE given here #8442 (comment) I checked the types of the underlying array and how this works for transposing or not:

cache=True in open_dataset (default)
- no transpose
  - before copy: <class 'xarray.core.indexing.MemoryCachedArray'>
  - after copy: <class 'xarray.core.indexing.MemoryCachedArray'>
  - trying to pickle raises TypeError: cannot pickle '_thread.lock' object in pickle
- with transpose
  - before transpose: <class 'xarray.core.indexing.MemoryCachedArray'>
  - after transpose: <class 'xarray.core.indexing.LazilyVectorizedIndexedArray'>
  - trying to copy raises: TypeError: cannot pickle '_thread.lock' object in deepcopy
cache=False in open_dataset
- no transpose
  - before copy: <class 'xarray.core.indexing.CopyOnWriteArray'>
  - after copy: <class 'xarray.core.indexing.CopyOnWriteArray'>
  - trying to pickle raises TypeError: cannot pickle '_thread.lock' object in pickle
- with transpose
  - before transpose: <class 'xarray.core.indexing.CopyOnWriteArray'>
  - after transpose: <class 'xarray.core.indexing.LazilyVectorizedIndexedArray'>
  - trying to copy raises: TypeError: cannot pickle '_thread.lock' object in deepcopy

Reading with netcdf4 and h5netcdf backends the data is wrapped in xarray's lazy classes See https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf:

Data is always loaded lazily from netCDF files. You can manipulate, slice and subset Dataset and DataArray objects, and no array values are loaded into memory until you try to perform some sort of actual computation.

and further:

Xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, it is often a good idea to load a Dataset (or DataArray) entirely into memory by invoking the Dataset.load() method.

There is also a mention for Pickle:

https://docs.xarray.dev/en/stable/user-guide/io.html#pickle

When pickling an object opened from a NetCDF file, the pickle file will contain a reference to the file on disk. If you want to store the actual array values, load it into memory first with Dataset.load() or Dataset.compute().

What to do?

The pickle issue might not be the big problem as the user is advised to load/compute before. But the copy-issue should be resolved somehow. Unfortunately I do not have an immediate solution to this. @pydata/xarray any ideas?

max-sixty · 2023-12-15T20:37:38Z

(brief message to say thanks a lot @kmuehlbauer for the excellent summary)

shoyer · 2023-12-16T20:26:17Z

I believe the issue are these two default locks for HDF5 and NetCDFC:

xarray/xarray/backends/locks.py

Line 18 in 2971994

HDF5_LOCK = SerializableLock()

Probably the easiest way to handle this is to fork the code for SerializableLock from dask. It isn't very complicated:
https://github.com/dask/dask/blob/6f2100847e2042d459534294531e8884bef13a99/dask/utils.py#L1160

kmuehlbauer · 2023-12-16T20:47:51Z

Thanks @shoyer!

sharkinsspatial added the needs triage Issue that has not been reviewed by xarray team member label Nov 11, 2023

max-sixty mentioned this issue Dec 5, 2023

Cannot pickle '_io.BufferedReader' object exception after DataArray transpose and copy operations when using fsspec filecache. #8443

Open

max-sixty added needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports and removed needs triage Issue that has not been reviewed by xarray team member labels Dec 5, 2023

max-sixty removed the needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports label Dec 15, 2023

kmuehlbauer mentioned this issue Dec 22, 2023

ENH: vendor SerializableLock from dask and use as default backend lock, adapt tests #8571

Merged

4 tasks

kmuehlbauer closed this as completed in #8571 Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot pickle '_thread.lock' object exception after DataArray transpose and copy operations from netCDF file. #8442

Cannot pickle '_thread.lock' object exception after DataArray transpose and copy operations from netCDF file. #8442

sharkinsspatial commented Nov 11, 2023

welcome bot commented Nov 11, 2023

sharkinsspatial commented Dec 6, 2023 •

edited

Loading

max-sixty commented Dec 6, 2023 •

edited

Loading

sharkinsspatial commented Dec 6, 2023 •

edited by dcherian

Loading

dcherian commented Dec 6, 2023 •

edited

Loading

sharkinsspatial commented Dec 7, 2023

sharkinsspatial commented Dec 7, 2023 •

edited

Loading

max-sixty commented Dec 7, 2023

zmoon commented Dec 13, 2023 •

edited

Loading

zmoon commented Dec 13, 2023 •

edited

Loading

kmuehlbauer commented Dec 13, 2023 •

edited

Loading

kmuehlbauer commented Dec 13, 2023

zmoon commented Dec 13, 2023 •

edited

Loading

kmuehlbauer commented Dec 13, 2023

TomNicholas commented Dec 13, 2023 •

edited

Loading

kmuehlbauer commented Dec 13, 2023

kmuehlbauer commented Dec 13, 2023

zmoon commented Dec 13, 2023 •

edited

Loading

kmuehlbauer commented Dec 15, 2023

max-sixty commented Dec 15, 2023

shoyer commented Dec 16, 2023

kmuehlbauer commented Dec 16, 2023

Cannot pickle '_thread.lock' object exception after DataArray transpose and copy operations from netCDF file. #8442

Cannot pickle '_thread.lock' object exception after DataArray transpose and copy operations from netCDF file. #8442

Comments

sharkinsspatial commented Nov 11, 2023

What is your issue?

welcome bot commented Nov 11, 2023

sharkinsspatial commented Dec 6, 2023 • edited Loading

max-sixty commented Dec 6, 2023 • edited Loading

sharkinsspatial commented Dec 6, 2023 • edited by dcherian Loading

dcherian commented Dec 6, 2023 • edited Loading

sharkinsspatial commented Dec 7, 2023

sharkinsspatial commented Dec 7, 2023 • edited Loading

max-sixty commented Dec 7, 2023

zmoon commented Dec 13, 2023 • edited Loading

Footnotes

zmoon commented Dec 13, 2023 • edited Loading

kmuehlbauer commented Dec 13, 2023 • edited Loading

kmuehlbauer commented Dec 13, 2023

zmoon commented Dec 13, 2023 • edited Loading

kmuehlbauer commented Dec 13, 2023

TomNicholas commented Dec 13, 2023 • edited Loading

kmuehlbauer commented Dec 13, 2023

kmuehlbauer commented Dec 13, 2023

zmoon commented Dec 13, 2023 • edited Loading

kmuehlbauer commented Dec 15, 2023

TL;DR:

Inspection:

What to do?

max-sixty commented Dec 15, 2023

shoyer commented Dec 16, 2023

kmuehlbauer commented Dec 16, 2023

sharkinsspatial commented Dec 6, 2023 •

edited

Loading

max-sixty commented Dec 6, 2023 •

edited

Loading

sharkinsspatial commented Dec 6, 2023 •

edited by dcherian

Loading

dcherian commented Dec 6, 2023 •

edited

Loading

sharkinsspatial commented Dec 7, 2023 •

edited

Loading

zmoon commented Dec 13, 2023 •

edited

Loading

zmoon commented Dec 13, 2023 •

edited

Loading

kmuehlbauer commented Dec 13, 2023 •

edited

Loading

zmoon commented Dec 13, 2023 •

edited

Loading

TomNicholas commented Dec 13, 2023 •

edited

Loading

zmoon commented Dec 13, 2023 •

edited

Loading