-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't train model for polish language #3
Comments
It works for me on Python 3, could you try that? |
I've modified |
Good to hear! Just to check: are you running Python 2 or 3? |
I didn't change anything in my config and in project files. My default python version is 2.7, didn't notice that earlier :) |
Cool, I've added a note to the code for future reference. |
Processing with my way is finished, but results are strange. EDIT: I changed a python version with command |
Hm, that's weird: it does seem to work in Polish for me. What's the result of:
|
Result is EDIT: The same issue on the Ubuntu. |
Hi,
I tried to run the code for polish language, but after downloading a data from wiki I've got an error:
python process_wiki.py ./data/pl/plwiki-latest-pages-articles.xml.bz2 ./data/pl/wiki.pl.text 2018-01-16 09:51:36,820: INFO: Running process_wiki.py ./data/pl/plwiki-latest-pages-articles.xml.bz2 ./data/pl/wiki.pl.text Traceback (most recent call last): File "process_wiki.py", line 43, in <module> output.write(" ".join(text) + "\n") UnicodeEncodeError: 'ascii' codec can't encode character u'\u0119' in position 20: ordinal not in range(128) make: *** [data/pl/wiki.pl.text] Error 1
.Is there any way to train model on unicode characters, not ascii?
The text was updated successfully, but these errors were encountered: