Can't train model for polish language #3

Manfed · 2018-01-16T08:58:54Z

Hi,
I tried to run the code for polish language, but after downloading a data from wiki I've got an error:
python process_wiki.py ./data/pl/plwiki-latest-pages-articles.xml.bz2 ./data/pl/wiki.pl.text 2018-01-16 09:51:36,820: INFO: Running process_wiki.py ./data/pl/plwiki-latest-pages-articles.xml.bz2 ./data/pl/wiki.pl.text Traceback (most recent call last): File "process_wiki.py", line 43, in <module> output.write(" ".join(text) + "\n") UnicodeEncodeError: 'ascii' codec can't encode character u'\u0119' in position 20: ordinal not in range(128) make: *** [data/pl/wiki.pl.text] Error 1.

Is there any way to train model on unicode characters, not ascii?

The text was updated successfully, but these errors were encountered:

hgrif · 2018-01-16T09:22:48Z

It works for me on Python 3, could you try that?

Manfed · 2018-01-16T09:27:11Z

I've modified process_wiki.py file. I changed line 43 to output.write(" ".join(unicode(text)) + "\n")
After that the processing started without errors.

hgrif · 2018-01-16T09:38:21Z

Good to hear! Just to check: are you running Python 2 or 3?

Manfed · 2018-01-16T10:28:46Z

I didn't change anything in my config and in project files. My default python version is 2.7, didn't notice that earlier :)
Probably if this will be run with Python 3 there will be no problems.

hgrif · 2018-01-16T10:49:35Z

Cool, I've added a note to the code for future reference.

Manfed · 2018-01-16T11:14:20Z

Processing with my way is finished, but results are strange. model_pl.word2vec.model.txt file has only 42 vectors and most of them contains only 1 character. I'll try to run make with python3.
BTW I'm doing this on MAC if this makes some difference :)

EDIT: I changed a python version with command alias python='python3', but now I'm getting the first error message.

hgrif · 2018-01-16T17:18:50Z

Hm, that's weird: it does seem to work in Polish for me.

What's the result of:

$ python --version

Manfed · 2018-01-16T19:47:01Z

Result is Python 3.6.3
Maybe is't something with my mac config?

EDIT: The same issue on the Ubuntu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't train model for polish language #3

Can't train model for polish language #3

Manfed commented Jan 16, 2018 •

edited

Loading

hgrif commented Jan 16, 2018

Manfed commented Jan 16, 2018

hgrif commented Jan 16, 2018

Manfed commented Jan 16, 2018

hgrif commented Jan 16, 2018

Manfed commented Jan 16, 2018 •

edited

Loading

hgrif commented Jan 16, 2018 •

edited

Loading

Manfed commented Jan 16, 2018 •

edited

Loading

Can't train model for polish language #3

Can't train model for polish language #3

Comments

Manfed commented Jan 16, 2018 • edited Loading

hgrif commented Jan 16, 2018

Manfed commented Jan 16, 2018

hgrif commented Jan 16, 2018

Manfed commented Jan 16, 2018

hgrif commented Jan 16, 2018

Manfed commented Jan 16, 2018 • edited Loading

hgrif commented Jan 16, 2018 • edited Loading

Manfed commented Jan 16, 2018 • edited Loading

Manfed commented Jan 16, 2018 •

edited

Loading

Manfed commented Jan 16, 2018 •

edited

Loading

hgrif commented Jan 16, 2018 •

edited

Loading

Manfed commented Jan 16, 2018 •

edited

Loading