Custom pickling in autode breaks loading of old npz files in mlptrain #372

danielhollas · 2025-02-04T21:08:27Z

Hi 👋

over at mlptrain we're trying to update our autode version. Things were going smoothly until we noticed that since autode v1.3.4 we could no longer load certain .npz files, which are used in mlptrain to store the training data. Specifically, we're getting this confusing stacktrace:

>>> data = mlt.ConfigurationSet()
>>> data.load('mg_aqua_mace_al.npz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/veronikajuraskova/SOFTWARE/mlp-train/mlptrain/configurations/configuration_set.py", line 493, in load
    self._load_npz(filename)
  File "/Users/veronikajuraskova/SOFTWARE/mlp-train/mlptrain/configurations/configuration_set.py", line 685, in _load_npz
    if data['F_true'].ndim > 0:
  File "/Users/veronikajuraskova/miniconda3/envs/mlptrain-gap/lib/python3.9/site-packages/numpy/lib/npyio.py", line 256, in __getitem__
    return format.read_array(bytes,
  File "/Users/veronikajuraskova/miniconda3/envs/mlptrain-gap/lib/python3.9/site-packages/numpy/lib/format.py", line 800, in read_array
    array = pickle.load(fp, **pickle_kwargs)
  File "/Users/veronikajuraskova/miniconda3/envs/mlptrain-gap/lib/python3.9/site-packages/autode/values.py", line 667, in __setstate__
    self.__dict__.update(state[-1])
TypeError: cannot convert dictionary update sequence element #0 to a sequence

This is only happening for some older npz files. After some tedious debugging with @juraskov, we've identified the following conditions for the problematic npz files:

only ConfigurationSet classes that contained Configurations with different number of atoms (e.g. differently solvated clusters)
The npz files had to be produced with older mlptrain version, specifically before Load xyz energies, forces, box mlp-train#117, which added dtype.object to some ndarrays.
numpy version <1.24

The __setstate__ function which raises the exception extends the pickling protocol for autode.values.ValueArray (which subclasses np.ndarray) and has been introduced as part of #215 to solve issue described in #221 (see also standalone PR #222).

autodE/autode/values.py

Line 666 in 6a6eebe

def __setstate__(self, state, *args, **kwargs):

It seems like the following patch solves the loading issue, insofar as access to the underlying ndarrays is concerned (the extra attributes are lost, but the underlying data are more important to recover).

diff --git a/autode/values.py b/autode/values.py
index 81d2cde..e05a1a9 100644
--- a/autode/values.py
+++ b/autode/values.py
@@ -664,8 +664,11 @@ class ValueArray(ABC, np.ndarray):
         )
 
     def __setstate__(self, state, *args, **kwargs):
-        self.__dict__.update(state[-1])
-        super().__setstate__(state[:-1], *args, **kwargs)
+        try:
+            self.__dict__.update(state[-1])
+            super().__setstate__(state[:-1], *args, **kwargs)
+        except TypeError:
+            super().__setstate__(state, *args, **kwargs)
 
     def to(self, units) -> Any:
         """

It's not a pretty solution, but feels reasonably safe, since we would be basically falling back to the original __setstate__ implementation. Happy to hear better ideas! Also happy to submit a PR.
Unfortunately, on the mlptrain side the issue cannot be ignored since it affects a fair amount of existing data.

The text was updated successfully, but these errors were encountered:

shoubhikraj · 2025-02-05T11:25:01Z

I think this would work. Also wondering if it might be possible to check if the last item of state is a dict and whether that would be more safer than a try...except.

Also, does losing the extra attributes cause any issues?

danielhollas · 2025-02-05T15:09:25Z

Also wondering if it might be possible to check if the last item of state is a dict and whether that would be more safer than a try...except.

I think try...except is safer/easiest since the last item doesn't need to be a dict, I think any Mapping type would work? I also like the explicitness of the try..except, since this situation should be exceptional. But I guess both would work.

Also, does losing the extra attributes cause any issues?

yeah, trying to access the extra attributes (even just calling vars on the object) might trigger an exception. So it's not ideal, but the important part is that one can access the underlying ndarray data.

I think it would definitely be worth emitting some kind of warning for this case. Not sure what's the best way to do that in autode.

danielhollas changed the title ~~Custom pickling in autode breaks loading of npz files in mlptrain~~ Custom pickling in autode breaks loading of old npz files in mlptrain Feb 4, 2025

danielhollas mentioned this issue Feb 7, 2025

Support old mlptrain npz files #373

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom pickling in autode breaks loading of old npz files in mlptrain #372

Custom pickling in autode breaks loading of old npz files in mlptrain #372

danielhollas commented Feb 4, 2025

shoubhikraj commented Feb 5, 2025 •

edited

Loading

danielhollas commented Feb 5, 2025

Custom pickling in autode breaks loading of old npz files in mlptrain #372

Custom pickling in autode breaks loading of old npz files in mlptrain #372

Comments

danielhollas commented Feb 4, 2025

shoubhikraj commented Feb 5, 2025 • edited Loading

danielhollas commented Feb 5, 2025

shoubhikraj commented Feb 5, 2025 •

edited

Loading