Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printing issue with crfsuite CRF model features for Unicode Text model #31

Open
anandi1989 opened this issue Oct 29, 2018 · 0 comments
Open

Comments

@anandi1989
Copy link

I have a CRF (Object Type: sklearn_crfsuite.estimator.CRF) model where features data is in utf8 format. The model is working fine in terms of prediction. Now I want to get the insight of the CRF model.

In order to do that whenever I tried to print crf.attributes_ , crf.state_features_ and crf.transition_features_ I am getting following errors:

Traceback (most recent call last):
  File "C:\Users\user123\eclipse-workspace\xxx_path\standalone scripts\crfModelAnalysis.py", line 20, in <module>
    print_transitions(Counter(crf.transition_features_).most_common(k))
  File "C:\Python27\lib\site-packages\sklearn_crfsuite\estimator.py", line 490, in transition_features_
    if self._info is None:
  File "C:\Python27\lib\site-packages\sklearn_crfsuite\estimator.py", line 499, in _info
    self._info_cached = self.tagger_.info()
  File "pycrfsuite\_pycrfsuite.pyx", line 704, in pycrfsuite._pycrfsuite.Tagger.info
  File "pycrfsuite\_pycrfsuite.pyx", line 706, in pycrfsuite._pycrfsuite.Tagger.info
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 27: invalid start byte

Basic Info:
Model is saved in pickle format.
Python Version : 2.7
sklearn-crfsuite==0.3.6

Any kind of help will be highly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant