Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't convert string categoricals #963

Closed
jeskowagner opened this issue Mar 29, 2023 · 6 comments
Closed

Can't convert string categoricals #963

jeskowagner opened this issue Mar 29, 2023 · 6 comments

Comments

@jeskowagner
Copy link
Contributor

jeskowagner commented Mar 29, 2023

Hi Isaac,

This issue is related to #726 and others dealing with implicit conversion during writing.

AnnData.write() (to h5ad) fails if a categorical column in obs or var has categories of type string (note: not str).

Identifying the offending pd.Series and converting it, e.g. by doing adata.obs["col"].astype(str).astype("category") fixes the problem.

Reproducible example:

import pandas as pd
import numpy as np
import anndata as ad
obs = pd.DataFrame(pd.Series([1,2,3],name="test").astype("string").astype("category"))
X = np.random.random((3,3)) 
adata = ad.AnnData(X=X, obs=obs)
adata.write("test.h5ad") # this fails, traceback below
adata.obs["test"] = adata.obs["test"].astype(str).astype("category")
adata.write("test.h5ad") # works, because `categories` have been converted to `object`

Versions:
anndata : 0.8.0
h5py : 3.7.0

Traceback
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/anndata/_io/utils.py:214, in report_write_key_on_error.<locals>.func_wrapper(elem, key, val, *args, **kwargs)
    213 try:
--> 214     return func(elem, key, val, *args, **kwargs)
    215 except Exception as e:

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py:175, in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174 else:
--> 175     _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py:64, in IORegistry.get_writer(self, dest_type, typ, modifiers)
     63 if (dest_type, typ, modifiers) not in self.write:
---> 64     raise TypeError(
     65         f"No method has been defined for writing {typ} elements to {dest_type}"
     66     )
     68 return self.write[(dest_type, typ, modifiers)]

TypeError: No method has been defined for writing <class 'pandas.core.arrays.string_.StringArray'> elements to <class 'h5py._hl.group.Group'>

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In [170], line 1
----> 1 adata.write("test.h5ad")

File ~/.local/lib/python3.10/site-packages/anndata/_core/anndata.py:1918, in AnnData.write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1915 if filename is None:
   1916     filename = self.filename
-> 1918 _write_h5ad(
   1919     Path(filename),
   1920     self,
   1921     compression=compression,
   1922     compression_opts=compression_opts,
   1923     force_dense=force_dense,
   1924     as_dense=as_dense,
   1925 )
   1927 if self.isbacked:
   1928     self.file.filename = filename

File ~/.local/lib/python3.10/site-packages/anndata/_io/h5ad.py:98, in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
     96 elif adata.raw is not None:
     97     write_elem(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
---> 98 write_elem(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
     99 write_elem(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
    100 write_elem(f, "obsm", dict(adata.obsm), dataset_kwargs=dataset_kwargs)

~/.local/lib/python3.10/site-packages/anndata/_io/utils.py:214, in report_write_key_on_error.<locals>.func_wrapper(elem, key, val, *args, **kwargs)
    211 @wraps(func)
    212 def func_wrapper(elem, key, val, *args, **kwargs):
    213     try:
--> 214         return func(elem, key, val, *args, **kwargs)
    215     except Exception as e:
    216         if "Above error raised while writing key" in format(e):

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py:175, in write_elem(f, k, elem, modifiers, *args, **kwargs)
    171     _REGISTRY.get_writer(dest_type, (t, elem.dtype.kind), modifiers)(
    172         f, k, elem, *args, **kwargs
    173     )
    174 else:
--> 175     _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py:24, in write_spec.<locals>.decorator.<locals>.wrapper(g, k, *args, **kwargs)
     22 @wraps(func)
     23 def wrapper(g, k, *args, **kwargs):
---> 24     result = func(g, k, *args, **kwargs)
     25     g[k].attrs.setdefault("encoding-type", spec.encoding_type)
     26     g[k].attrs.setdefault("encoding-version", spec.encoding_version)

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/methods.py:514, in write_dataframe(f, key, df, dataset_kwargs)
    511 write_elem(group, index_name, df.index._values, dataset_kwargs=dataset_kwargs)
    512 for colname, series in df.items():
    513     # TODO: this should write the "true" representation of the series (i.e. the underlying array or ndarray depending)
--> 514     write_elem(group, colname, series._values, dataset_kwargs=dataset_kwargs)

File ~/.local/lib/python3.10/site-packages/anndata/_io/utils.py:214, in report_write_key_on_error.<locals>.func_wrapper(elem, key, val, *args, **kwargs)
    211 @wraps(func)
    212 def func_wrapper(elem, key, val, *args, **kwargs):
    213     try:
--> 214         return func(elem, key, val, *args, **kwargs)
    215     except Exception as e:
    216         if "Above error raised while writing key" in format(e):

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py:175, in write_elem(f, k, elem, modifiers, *args, **kwargs)
    171     _REGISTRY.get_writer(dest_type, (t, elem.dtype.kind), modifiers)(
    172         f, k, elem, *args, **kwargs
    173     )
    174 else:
--> 175     _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/registry.py:24, in write_spec.<locals>.decorator.<locals>.wrapper(g, k, *args, **kwargs)
     22 @wraps(func)
     23 def wrapper(g, k, *args, **kwargs):
---> 24     result = func(g, k, *args, **kwargs)
     25     g[k].attrs.setdefault("encoding-type", spec.encoding_type)
     26     g[k].attrs.setdefault("encoding-version", spec.encoding_version)

File ~/.local/lib/python3.10/site-packages/anndata/_io/specs/methods.py:617, in write_categorical(f, k, v, dataset_kwargs)
    614 g.attrs["ordered"] = bool(v.ordered)
    616 write_elem(g, "codes", v.codes, dataset_kwargs=dataset_kwargs)
--> 617 write_elem(g, "categories", v.categories._values, dataset_kwargs=dataset_kwargs)

File ~/.local/lib/python3.10/site-packages/anndata/_io/utils.py:220, in report_write_key_on_error.<locals>.func_wrapper(elem, key, val, *args, **kwargs)
    218 else:
    219     parent = _get_parent(elem)
--> 220     raise type(e)(
    221         f"{e}\n\n"
    222         f"Above error raised while writing key {key!r} of {type(elem)} "
    223         f"to {parent}"
    224     ) from e

TypeError: No method has been defined for writing <class 'pandas.core.arrays.string_.StringArray'> elements to <class 'h5py._hl.group.Group'>

Above error raised while writing key 'categories' of <class 'h5py._hl.group.Group'> to /
@jeskowagner
Copy link
Contributor Author

Just spotted #679, happy to close this issue here if 0.9.0 will solve this.

@github-actions
Copy link

github-actions bot commented Jun 9, 2023

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

@flying-sheep
Copy link
Member

Yeah, should be a duplicate of #679, but I added a comment there to clarify

@flying-sheep flying-sheep closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2023
@ngvananh2508
Copy link

I had the same issue too.

When I constructed an anndata from this pandas df:

image
I took the index of this df and transposed data of this df to set to obs_names and var_names and var['name'] of this anndata.
When I wrote a an h5ad file, it raised the error ralted to var data.

I tried to cast like jeskowagner said but it did not work.

image

Can any one please help me? Thank you very much

@jeskowagner
Copy link
Contributor Author

@ngvananh2508 your error message mentions var and _index, so I would try:

adata.var.index = adata.var.index.astype(str).astype("category")

Hope that works!

@ngvananh2508
Copy link

Thank you so much! It worked already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants