Replies: 2 comments 1 reply
-
This may be of interest, should also work in wasm https://github.com/Maxxen/duckdb-vss |
Beta Was this translation helpful? Give feedback.
1 reply
-
I found this to be a helpful warmup reading for what might be involved with WebAssembly or C++ extensions: https://simonw.substack.com/p/building-and-testing-c-extensions |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I chatted with the duckdb.org team (@Mytherin :) mid-2023, they said vector datatypes are somewhat supported: https://duckdb.org/docs/sql/data_types/array.html approximate nearest neighbor search may or may not be on the roadmap depending on commercial contracts.
This enables novel use cases that leverage embeddings (of words, text, images, video) and enable interactive search/retrieval/visualization over millions of entities/concepts/people.
It shouldn't be hard to port hierarchical navigable small worlds or other approximate nearest neighbor algorithms into rust for use via WebAssembly; here's one example repo: https://github.com/shravansunder/hnswlib-wasm
As one example: I made this demo of 35 million people's income over 20 years - https://jaanli.github.io/american-community-survey/income
Now I want to filter it down to the "most similar" individuals or individuals belonging to clusters across the hundreds of variables in the census.
These clusters can be pre-computed, but are often not obvious and require iterative interaction with the dataset. In-browser rust implementations with WebAssembly (or WebGPU: https://codelabs.developers.google.com/your-first-webgpu-app) feel like they would unlock a large number of use cases like this.
If anyone is interested in collaborating, please ping me! This would enable very cool large language model use cases, or semantic search over thousands of Observable Framework dashboards (a common use case I have working in health care across hospitals/countries).
A stopgap solution would be hacking together something like motherduck.com to offload some of the approximate nearest neighbor search & retrieval to an AWS lambda function (e.g. on the free tier). Here's an example of how to do this: https://github.com/yoshoku/hnswlib-node/wiki/How-to-run-hnswlib-node-on-AWS-Lambda
cc @Maxxen who I've also discussed this with (approximate nearest neighbors are relevant to large-scale geospatial computations like calculating nearest neighbors)
Beta Was this translation helpful? Give feedback.
All reactions