Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom pickling in autode breaks loading of old npz files in mlptrain #372

Open
danielhollas opened this issue Feb 4, 2025 · 2 comments

Comments

@danielhollas
Copy link
Contributor

Hi 👋

over at mlptrain we're trying to update our autode version. Things were going smoothly until we noticed that since autode v1.3.4 we could no longer load certain .npz files, which are used in mlptrain to store the training data. Specifically, we're getting this confusing stacktrace:

>>> data = mlt.ConfigurationSet()
>>> data.load('mg_aqua_mace_al.npz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/veronikajuraskova/SOFTWARE/mlp-train/mlptrain/configurations/configuration_set.py", line 493, in load
    self._load_npz(filename)
  File "/Users/veronikajuraskova/SOFTWARE/mlp-train/mlptrain/configurations/configuration_set.py", line 685, in _load_npz
    if data['F_true'].ndim > 0:
  File "/Users/veronikajuraskova/miniconda3/envs/mlptrain-gap/lib/python3.9/site-packages/numpy/lib/npyio.py", line 256, in __getitem__
    return format.read_array(bytes,
  File "/Users/veronikajuraskova/miniconda3/envs/mlptrain-gap/lib/python3.9/site-packages/numpy/lib/format.py", line 800, in read_array
    array = pickle.load(fp, **pickle_kwargs)
  File "/Users/veronikajuraskova/miniconda3/envs/mlptrain-gap/lib/python3.9/site-packages/autode/values.py", line 667, in __setstate__
    self.__dict__.update(state[-1])
TypeError: cannot convert dictionary update sequence element #0 to a sequence

This is only happening for some older npz files. After some tedious debugging with @juraskov, we've identified the following conditions for the problematic npz files:

  • only ConfigurationSet classes that contained Configurations with different number of atoms (e.g. differently solvated clusters)
  • The npz files had to be produced with older mlptrain version, specifically before Load xyz energies, forces, box mlp-train#117, which added dtype.object to some ndarrays.
  • numpy version <1.24

The __setstate__ function which raises the exception extends the pickling protocol for autode.values.ValueArray (which subclasses np.ndarray) and has been introduced as part of #215 to solve issue described in #221 (see also standalone PR #222).

def __setstate__(self, state, *args, **kwargs):

It seems like the following patch solves the loading issue, insofar as access to the underlying ndarrays is concerned (the extra attributes are lost, but the underlying data are more important to recover).

diff --git a/autode/values.py b/autode/values.py
index 81d2cde..e05a1a9 100644
--- a/autode/values.py
+++ b/autode/values.py
@@ -664,8 +664,11 @@ class ValueArray(ABC, np.ndarray):
         )
 
     def __setstate__(self, state, *args, **kwargs):
-        self.__dict__.update(state[-1])
-        super().__setstate__(state[:-1], *args, **kwargs)
+        try:
+            self.__dict__.update(state[-1])
+            super().__setstate__(state[:-1], *args, **kwargs)
+        except TypeError:
+            super().__setstate__(state, *args, **kwargs)
 
     def to(self, units) -> Any:
         """

It's not a pretty solution, but feels reasonably safe, since we would be basically falling back to the original __setstate__ implementation. Happy to hear better ideas! Also happy to submit a PR.
Unfortunately, on the mlptrain side the issue cannot be ignored since it affects a fair amount of existing data.

@danielhollas danielhollas changed the title Custom pickling in autode breaks loading of npz files in mlptrain Custom pickling in autode breaks loading of old npz files in mlptrain Feb 4, 2025
@shoubhikraj
Copy link
Collaborator

shoubhikraj commented Feb 5, 2025

I think this would work. Also wondering if it might be possible to check if the last item of state is a dict and whether that would be more safer than a try...except.

Also, does losing the extra attributes cause any issues?

@danielhollas
Copy link
Contributor Author

Also wondering if it might be possible to check if the last item of state is a dict and whether that would be more safer than a try...except.

I think try...except is safer/easiest since the last item doesn't need to be a dict, I think any Mapping type would work? I also like the explicitness of the try..except, since this situation should be exceptional. But I guess both would work.

Also, does losing the extra attributes cause any issues?

yeah, trying to access the extra attributes (even just calling vars on the object) might trigger an exception. So it's not ideal, but the important part is that one can access the underlying ndarray data.

I think it would definitely be worth emitting some kind of warning for this case. Not sure what's the best way to do that in autode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants