Resources are busy message due to dataset issues #532

JPReceveur · 2023-05-17T18:19:54Z

Documenting this behavior for the future in case others ask about it.

Found a dataset where I could replicate a 'Resources are busy' message that might be related to what Bea observed in #514 . In the linked dataset, if you curate a tsne view by celltype it shows fine but if you try to make a violin plot, it results in a resources are busy message. Dug into the dataset a bit and it looks like its being caused by NA values in the dataset interacting differently with the plotting tools being used.

https://umgear.org/index.html?multigene_plots=0&share_id=39260e4e&layout_id=d4505add&gene_symbol_exact_match=1&gene_symbol=sox2

adkinsrs · 2023-05-17T18:22:44Z

So I think the "Resources are busy" error is a catch all for when Python throws a standard exception that isn't caught. I need to go in an differentiate when the error is truly due to resources being busy vs a Pandas error (or something related)... or just fix that error :-D

adkinsrs · 2023-05-24T19:40:18Z

@JPReceveur is it fair to drop the cells with the "missing" celltype values, or should we just keep the cells, but fill the missing value with a literal "NA" or "Unknown"?

Ideally, I think the quick solution is to handle this within the plotting API sections, but the long-term solution would be to sanitize these values upon dataset upload.

adkinsrs · 2023-05-25T13:06:32Z

@JPReceveur and I had a mini-chat and decided it was just best to fill the missing values for this with a "NA". Note this would be my solution for strings... if the missing value is in a numerical datatype, I am going to drop the row, since setting an arbitrary number (0, -1, the mean, etc) would be situationally dependent.

I believe the updated dataset uploaded should check and confirm missing values, right @jorvis?

adkinsrs · 2023-05-25T13:58:44Z

Looks like for the tSNE/UMAP plots, scanpy runs anndata._sanitize which converts strings into categoricals (https://github.com/scverse/anndata/blob/14cadf18ce1baaa76f8598de6526a479e76ed14b/anndata/_core/anndata.py#L1175). Anndata also supports nullable categorical values (scverse/scirpy#190 (comment)) though in reality, Pandas silently leaves them out of the list of categories, so they are effectively ignored.

So I think the best solution would be to run this strings_to_categoricals function before running the plotly stuff, since scanpy is doing this under the hood for their plots. I will give this a shot, and see if the violin issue is resolved

adkinsrs · 2023-05-25T14:39:27Z

So I discovered that our plotly code automatically does filter out null groups for the "x" axis param. However the color mapping code itself is where the issue and the null value is breaking that.

In the dataset curator, I noticed that the "h5ad" api call is removing nulls from the Categories, but they are being added back into the list of colors (probably because it is not checking the returned obs_levels for the correct values. When the dataset curator preview plot is run, it seems that the plotly API adds color mappings to all colors, no matter if an existing color mapping was passed or not, which affects future runs since this mapping is used to populate the Vue page and is saved in the config.

adkinsrs · 2023-06-22T17:33:37Z

tested on gear-devel

…o-dataset-issues Resolving #532

JPReceveur added bug Something isn't working Low Priority Not the biggest concern at the moment. Also includes "nice to haves" labels May 17, 2023

JPReceveur assigned jorvis and adkinsrs May 17, 2023

adkinsrs added the in_progress Work is in-progress for this ticket label May 24, 2023

adkinsrs added a commit that referenced this issue May 25, 2023

Resolving #532

d9f0c63

adkinsrs closed this as completed Jun 22, 2023

adkinsrs linked a pull request Jun 22, 2023 that will close this issue

Resolving #532 #547

Merged

adkinsrs mentioned this issue Jun 22, 2023

Gene Expression Search - cannot open "Datasets" modal if no default display is selected #548

Open

jorvis added a commit that referenced this issue Jun 22, 2023

Merge pull request #547 from IGS/532-resources-are-busy-message-due-t…

776db8f

…o-dataset-issues Resolving #532

adkinsrs added a commit that referenced this issue Oct 6, 2023

Resolving #532

db37d2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources are busy message due to dataset issues #532

Resources are busy message due to dataset issues #532

JPReceveur commented May 17, 2023

adkinsrs commented May 17, 2023

adkinsrs commented May 24, 2023 •

edited

Loading

adkinsrs commented May 25, 2023

adkinsrs commented May 25, 2023 •

edited

Loading

adkinsrs commented May 25, 2023 •

edited

Loading

adkinsrs commented Jun 22, 2023

Resources are busy message due to dataset issues #532

Resources are busy message due to dataset issues #532

Comments

JPReceveur commented May 17, 2023

adkinsrs commented May 17, 2023

adkinsrs commented May 24, 2023 • edited Loading

adkinsrs commented May 25, 2023

adkinsrs commented May 25, 2023 • edited Loading

adkinsrs commented May 25, 2023 • edited Loading

adkinsrs commented Jun 22, 2023

adkinsrs commented May 24, 2023 •

edited

Loading

adkinsrs commented May 25, 2023 •

edited

Loading

adkinsrs commented May 25, 2023 •

edited

Loading