Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BM25 algorithm slightly off on texts with lots of non-ascii characters #11

Open
zkry opened this issue Nov 23, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@zkry
Copy link
Owner

zkry commented Nov 23, 2024

The BM25 algorithm currently goes off of file size for the calculations. This could lead to slightly off calculations for files with lots of non-ascii text. I should find a way to quickly determine the files true size and set it to the document's size property.

@zkry zkry added the bug Something isn't working label Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant