You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature: To not only support one model that was somehow selected at development time and not have to maintain lists for every embedding model it would be great to have a option to overrule the hardcoded embeddingMaxChunkLength.
Explaination:
In Text splitting & Chunking Preferences the max chunk size seems to be embedding provider dependend rather than embedding model dependent, this leads through a max length of characters that does not fit to every model.
What would you like to see?
Feature: To not only support one model that was somehow selected at development time and not have to maintain lists for every embedding model it would be great to have a option to overrule the hardcoded embeddingMaxChunkLength.
Explaination:
In Text splitting & Chunking Preferences the max chunk size seems to be embedding provider dependend rather than embedding model dependent, this leads through a max length of characters that does not fit to every model.
e.G.
is set for the Azure OpenAI Embedding Provider.
The chunk size seems to be from
https://github.com/Mintplex-Labs/anything-llm/blob/da3d0283ffee9c592e5b81d2be6a848722df298f/server/utils/EmbeddingEngines/azureOpenAi/index.js#L22C10-L22C34
The model that was used as base seems to be text-embedding-ada-002, but there are already newer models like text-embedding-3-large.
Also it seems that the AnythingLLM embedder counts characters rather than tokens, reducing the amount of data in a vector even further.
The text was updated successfully, but these errors were encountered: