Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It detects other language but it should be English #3

Open
patrickReiis opened this issue Oct 11, 2024 · 2 comments
Open

It detects other language but it should be English #3

patrickReiis opened this issue Oct 11, 2024 · 2 comments

Comments

@patrickReiis
Copy link

Hi, I have no experience in this kind of stuff, so I'll just ask about it to get more information.

The following text:
It may die when I die, and that’s okay. It’s my earnings.
Gets detected as African:

[ 
  [ "afr", 0.9883103966712952 ],
  [ "eng", 0.011473776772618294 ] 
]

Do you know why this happens? Shorter texts like Good morning my friends gets detected as English.

So I presume it's not the amount of words but the way they are written? Thanks.

@fabiospampinato
Copy link
Owner

I don't know exactly what function the model learned, in general the bigger the piece of text that you give it (assuming it's not just a single word repeated or something like that) the more accurate the detection should be.

The model only sees n-grams of the text, so I guess there are some similarities in there between english and afrikaans, maybe I'm pre-processing the sentences I'm trying the model on incorrectly or something 🤔

I've been meaning to publish a better v2 of this, but I lack the time at the moment.

@patrickReiis
Copy link
Author

Got it, thanks for replying!

We use your library in our project for language detection, after we have the detected language we use another service to translate it, so your invention is very good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants