Replies: 3 comments 4 replies
-
Hi @bennjmo, It seems you have loaded the scaled data from H5ad file into Scarf. First you need to investigate if and where the raw data is stored in your anndata/H5ad file. Using import h5py
handle = h5py.File("<filename>.h5ad", mode="r")
print (handle.keys()) You should see the usual groups like, 'X', 'obs', 'obsm', 'uns', and 'var' in the output but, you are mainly looking for print (handle["layers"].keys()) Once you have located the raw count data (the group will have these three entries: readers = scarf.H5adReader(
h5ad_fn = "<filename>.h5ad",
matrix_key='layers/counts', # Override this
cell_attrs_key='obs',
cell_ids_key='_index',
feature_attrs_key='var',
feature_ids_key='_index',
feature_name_key='gene_short_name',
obsm_attrs_key: str = 'obsm',
category_names_key: str = '__categories'
) Hope this helps! Feel free to post here if you more questions :) /Parashar |
Beta Was this translation helpful? Give feedback.
-
Hi Ben, Yes. It does seem like that the raw counts weren't saved into the file. You may still be able to use Scarf using the normalized count values you have in the H5ad file. Read in the H5ad file like below:
Continue with the rest of the workflow as usual. But you will have to turn off the normalization because the data is already normalized. After loading the datastore using If the gene names or ids do not load correctly then you will need to look into the H5ad file and check the names within /Parashar PS: Looking at the few rows of data that you have printed out, I'm afraid that the data might be log transformed. This can be problematic for the highly variable gene selection step in Scarf. The problem can be avoided if you already have an HVG list that you want to use. In such case, all we need to do is to set |
Beta Was this translation helpful? Give feedback.
-
I am still getting errors when trying to write
Yet it still works if I use this as the reader. I don't think I understand the difference that
I think I'm just going to scratch this attempt and see if I can get my hands on raw data so I can proceed with the typical Scarf workflow. But I really do appreciate your assistance! |
Beta Was this translation helpful? Give feedback.
-
I have an existing Anndata object of a large dataset ~190k cells I would like to process with Scarf (it was previously processed on a cluster and I would like to see if Scarf would work for us going forward).
This is a subsetted version I am working with just to speed things up, but you can see the obs, var, and uns are already quite busy.
What is the best way to take an object like this into Scarf? I am having trouble reverting back to just raw data so that I can go through the Scarf workflow. For example
gave negative numbers on the RNA_ncounts plot.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions