-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Est_republicaine Corpus not found #110
Comments
This seems to be a problem of https://cnrtl.fr/ . I just mailed them a bug report. |
Ok thanks, I just found another download link, is this one: ( https://repository.ortolang.fr/api/content/export?&path=/est_republicain/4/&filename=est_republicain&scope=YW5vbnltb3Vz3 ) |
Someone from cnrtl.fr answered my question. The official new web site for this corpus is https://www.ortolang.fr/market/corpora/est_republicain |
The file format might have been changed. The idea is to extract text only and what you get is nearly what we need. You need to replace all sgml entities. |
Hello
I'm trying to download the est_republicaine corpus to train the French language model using KenLM, when I click on the link, it gives me this error "nginx error! The page you are looking for is not found"
any ideo, where can have this corpus ?
thanks
The text was updated successfully, but these errors were encountered: