Computing kNN graphs on user-provided dimensional reductions? #55
-
Hi! This is fantastic work, congratulations! I'm, however, saddened by the somewhat hard-coded kNN graph computation, based solely on PCA, which can be misleading if data does not necessarily lie in a series of linear subspaces. A huge deal of work has been done lately on dimensional reduction, and thus restricting the kNN graphs to PCA is an important limitation. Is there any way to compute the kNN graphs using a user-provided dimensionality reduced basis (i.e., ICA, CCA, DM, etc.)? In Scanpy and Seurat, users can specify the basis they would like to use. Alternatively, how can users add externally computed kNN graphs and dimensional reductions to Scarf objects? I've only got started with Scarf, so any directions on which parts of the code should be adapted to make this an option to the end-user would be fantastic. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments
-
Hi! Thanks! The reason that KNN graph computation is hard coded to PCA (and LSI for scATAC-Seq data) is simply because of availability of out-of-core implementation of PCA in the Python ecosystem (sklearn in this case). One of the mottos of Scarf is to be memory efficient and that's why we do not support methods which will violate that motto. But as you suggest, providing an externally computed KNN graph and dimension reductions into Scarf can be be a viable alternative. Importing reduced dimensions for purpose of graph calculation is a bit tricky at this point. It is however, quite possible. Suggestions are welcome. I'm working on feature to import KNN graph directly from H5ad (anndata) file, will that work you at this stage? If yes, then I can prioritize it. Please let me know as any early suggestions might be helpful. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the swift reply :)
I understand. That's reasonable.
Maybe playing around with transform_ann could do it? I'm unused to work with out-of-memory computing, but it seems like the kNN graph step comes right after obtaining PCA/LSI embeddings. Is that right? If so, maybe adding an option to, if the reduction method is not PCA nor LSI, it must be a user provided np.ndarray to compute the kNN graph on?
For my specific case, I'd like to both add dimensionality reductions and kNN graphs, but being able to build a new kNN graph out-of-memory on top of an externally provided dimensional reduction would be fantastic enough. Thank you for being so helpful. |
Beta Was this translation helpful? Give feedback.
-
Yes the changes will need to made in AnnStream class and also to three methods from GraphDataStore : _choose_reduction_method, _set_graph_params and make_graph. Overriding the default reduction methods with user provided matrix should be entirely possible. Let me work out a good strategy here. Do you have any more suggestions at this point? |
Beta Was this translation helpful? Give feedback.
-
Hi Davi, I have something in the works now that may solve the issue. The solution here is to upload a transformer, for example, PCA loadings of the form Obviously, the caveat here is that this only works for linear dimension reductions. What do you think? |
Beta Was this translation helpful? Give feedback.
-
Hi, Version 0.7.6 now contains the ability to provide external transformers here is an example:
|
Beta Was this translation helpful? Give feedback.
-
Hi! I'm really sorry for my late reply.
Would it be possible, though, at the user's discretion? Or would it be incompatible with the object architecture entirely?
I think this is a wonderful idea that could couple with methods such as the new liger integrated method, MOFA, NMF, and perhaps ICA. I really appreciate your swiftness in making this available - I think this would be really great with MOFA and NMF, and I'll try it out. However, as you've remembered,
The thing is I've developed a new family of non-linear topological dimensionality reduction approaches, which are implemented in TopOMetry. I'm keen to use Scarf instead of Scanpy or Pegasus in my team's workflow, but our main advantage at this point is the really high-resolution maps enabled by these non-linear mappings in our biological systems of interest. Although PCA preserves global distances well, the kNN graph built on it is not granted to preserve topology. Do you think it would be possible to add a user option of providing a pre-computed dimensionality reduction? |
Beta Was this translation helpful? Give feedback.
-
Hi @davisidarta, Glad to have your comments. It will be really nice to see if you find the new external transformer feature useful. I'm taking a deeper look into Let me summarize how I understand the default steps of dimension reduction in TopOMetry:
So here are my thoughts if you would like to use Scarf as a backend for TopOMetry.
The benefit, that I see of this approach is that, it can make TopOMetry highly scalable to larger datasets. Keen to have your comments and thoughts on such an approach? Best, |
Beta Was this translation helpful? Give feedback.
-
Hi @davisidarta Are you still interested in this? |
Beta Was this translation helpful? Give feedback.
Hi @davisidarta,
Glad to have your comments.
It will be really nice to see if you find the new external transformer feature useful.
I'm taking a deeper look into
TopOMetry
to investigate the possibilities for the two packages to play nice with each other. There are a lot of interesting concepts inTopOMetry
.Let me summarize how I understand the default steps of dimension reduction in TopOMetry: