Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources are busy message due to dataset issues #532

Closed
JPReceveur opened this issue May 17, 2023 · 6 comments · Fixed by #547
Closed

Resources are busy message due to dataset issues #532

JPReceveur opened this issue May 17, 2023 · 6 comments · Fixed by #547
Assignees
Labels
bug Something isn't working in_progress Work is in-progress for this ticket Low Priority Not the biggest concern at the moment. Also includes "nice to haves"

Comments

@JPReceveur
Copy link
Collaborator

Documenting this behavior for the future in case others ask about it.

Found a dataset where I could replicate a 'Resources are busy' message that might be related to what Bea observed in #514 . In the linked dataset, if you curate a tsne view by celltype it shows fine but if you try to make a violin plot, it results in a resources are busy message. Dug into the dataset a bit and it looks like its being caused by NA values in the dataset interacting differently with the plotting tools being used.

https://umgear.org/index.html?multigene_plots=0&share_id=39260e4e&layout_id=d4505add&gene_symbol_exact_match=1&gene_symbol=sox2

Screenshot 2023-05-17 at 2 10 40 PM Screenshot 2023-05-17 at 2 08 12 PM
@JPReceveur JPReceveur added bug Something isn't working Low Priority Not the biggest concern at the moment. Also includes "nice to haves" labels May 17, 2023
@adkinsrs
Copy link
Member

So I think the "Resources are busy" error is a catch all for when Python throws a standard exception that isn't caught. I need to go in an differentiate when the error is truly due to resources being busy vs a Pandas error (or something related)... or just fix that error :-D

@adkinsrs adkinsrs added the in_progress Work is in-progress for this ticket label May 24, 2023
@adkinsrs
Copy link
Member

adkinsrs commented May 24, 2023

@JPReceveur is it fair to drop the cells with the "missing" celltype values, or should we just keep the cells, but fill the missing value with a literal "NA" or "Unknown"?

Ideally, I think the quick solution is to handle this within the plotting API sections, but the long-term solution would be to sanitize these values upon dataset upload.

@adkinsrs
Copy link
Member

@JPReceveur and I had a mini-chat and decided it was just best to fill the missing values for this with a "NA". Note this would be my solution for strings... if the missing value is in a numerical datatype, I am going to drop the row, since setting an arbitrary number (0, -1, the mean, etc) would be situationally dependent.

I believe the updated dataset uploaded should check and confirm missing values, right @jorvis?

@adkinsrs
Copy link
Member

adkinsrs commented May 25, 2023

Looks like for the tSNE/UMAP plots, scanpy runs anndata._sanitize which converts strings into categoricals (https://github.com/scverse/anndata/blob/14cadf18ce1baaa76f8598de6526a479e76ed14b/anndata/_core/anndata.py#L1175). Anndata also supports nullable categorical values (scverse/scirpy#190 (comment)) though in reality, Pandas silently leaves them out of the list of categories, so they are effectively ignored.

So I think the best solution would be to run this strings_to_categoricals function before running the plotly stuff, since scanpy is doing this under the hood for their plots. I will give this a shot, and see if the violin issue is resolved

@adkinsrs
Copy link
Member

adkinsrs commented May 25, 2023

So I discovered that our plotly code automatically does filter out null groups for the "x" axis param. However the color mapping code itself is where the issue and the null value is breaking that.

In the dataset curator, I noticed that the "h5ad" api call is removing nulls from the Categories, but they are being added back into the list of colors (probably because it is not checking the returned obs_levels for the correct values. When the dataset curator preview plot is run, it seems that the plotly API adds color mappings to all colors, no matter if an existing color mapping was passed or not, which affects future runs since this mapping is used to populate the Vue page and is saved in the config.

adkinsrs added a commit that referenced this issue May 25, 2023
@adkinsrs
Copy link
Member

tested on gear-devel

@adkinsrs adkinsrs linked a pull request Jun 22, 2023 that will close this issue
jorvis added a commit that referenced this issue Jun 22, 2023
adkinsrs added a commit that referenced this issue Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in_progress Work is in-progress for this ticket Low Priority Not the biggest concern at the moment. Also includes "nice to haves"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants