You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
17:42:39: Parsing finished. Moving parsed files into place ...
Traceback (most recent call last):
File "/home/d/anaconda2/lib/python2.7/site-packages/corpkit/env.py", line 2168, in interpreter
out = run_command(tokens)
File "/home/d/anaconda2/lib/python2.7/site-packages/corpkit/env.py", line 1113, in run_command
out = command(tokens[1:])
File "/home/d/anaconda2/lib/python2.7/site-packages/corpkit/env.py", line 1437, in parse_corpus
parsed = to_parse.parse(**kwargs)
File "/home/d/anaconda2/lib/python2.7/site-packages/corpkit/corpus.py", line 930, in parse
**kwargs
File "/home/d/anaconda2/lib/python2.7/site-packages/corpkit/make.py", line 356, in make_corpus
coref=coref, metadata=metadata)
File "/home/d/anaconda2/lib/python2.7/site-packages/corpkit/conll.py", line 1113, in convert_json_to_conll
data = json.load(fo)
File "/home/d/anaconda2/lib/python2.7/json/__init__.py", line 291, in load
**kw)
File "/home/d/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/d/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/d/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1120 column 21 (char 28474)
The text was updated successfully, but these errors were encountered:
Thanks for these reports. This is a weird one---the json output of the CoreNLP parser cannot be understood by Python's json module. So, the problem is not really on corpkit's side, but CoreNLP's.
I'm guessing that it relates to the encoding in your text files. Would you be able to zip and upload the files in the unparsed/parsed versions of the corpus? This would help me diagnose the problem and make a fix.
Also, I'd recommend encoding your text files as UTF-8---that should fix this problem in your case. Or, as per the instructions on the issue linked above, update the CoreNLP installed to the GitHub version. If corpkit installed CoreNLP for you, it should be in your ~/corenlp directory.
Follwing error
The text was updated successfully, but these errors were encountered: