-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t-SNE optimization using scikit-learn-intelex #3061
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3061 +/- ##
==========================================
- Coverage 75.87% 75.19% -0.68%
==========================================
Files 110 110
Lines 12533 12536 +3
==========================================
- Hits 9509 9427 -82
- Misses 3024 3109 +85
|
For the above code, the time spent in tSNE went down from 2252 secs to 210 secs due to this PR. |
What’s the comparison to MulticoreTSNE? DefaultsIt would probably make sense to use a
if use_fast_tsne is not None:
warnings.warn("...", FutureWarning)
match (use_fast_tsne, flavor):
case (None, 'auto'): ... # try importing 'intelex', fall back to 'sklearn'
case (None, _): ... # use specified flavor
case (True, 'auto'): ... # use 'multicore'
case (True, 'sklearn'): ... # throw error
case (True, _): ... # use specified flavor
case (False, 'auto' | 'sklearn'): ... # Use 'sklearn'
case (False, _): ... # Throw error
case _: ... raise AssertionError() In the future, we can change @ilan-gold what do you think? |
This seems reasonable to me @flying-sheep . Does using the patched version change results over the unpatched @ashish615 i.e., for a given random seed, unpatched and patched are the same? If the two are the same for a given seed/state, then I think what @flying-sheep is proposing could be done separately (even if we make the dependency optional IMO). However, if the new version does change results, we will need the handling that @flying-sheep describes. |
This is the compare is while using use_fast_tsne flag which using MulticoreTSNE. |
@ilan-gold , I didn't check that. I will check that and let you guys know. |
These t-SNE optimizations are mentioned in the following paper. Adding it here for reference. |
@narendrachaudhary51 You can add the reference here https://github.com/scverse/scanpy/blob/main/docs/references.bib (thank you @flying-sheep for improving this!) |
Hi @flying-sheep @ilan-gold , In the above code use USE_FIRST_N_CELLS to set number of records and use sc.tl.tsne(adata, n_pcs=tsne_n_pcs, use_fast_tsne=False) to run optimized run with latest commit. You can get KL divergence numbers by logging kl_divergence_ |
As can be seen with the KL divergence values in the above table, while the output of Intel optimized t-SNE is different, it is equivalent in quality. |
@sanchit-misra @ashish615 one thing that we thought - why does this need to be in |
Notwithstanding the above question, since your method produces different results than the default, we will need to adopt the conventions outlined: #3061 (comment) This will ensure continued reproducibility with defaults. |
Hi @ilan-gold, Regarding your thought in this comment, we can enable or disable Intel optimization from outside the code. However, users might not be aware of how to use this feature. Instead, if we add it to scanpy directly, all scanpy users will know the same option available. If we agree with the option discussed in this comment, I can proceed with updating the t-SNE file. |
This pull request accelerates t-SNE using the scikit-learn-intelex library, resulting in approximately a 10x runtime improvement for the t-SNE implementation in the package for the given example below. The experiment was run on AWS r7i.24xlarge.