Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with annotate_ws.py #69

Open
bfinj opened this issue Aug 1, 2020 · 7 comments
Open

Error with annotate_ws.py #69

bfinj opened this issue Aug 1, 2020 · 7 comments

Comments

@bfinj
Copy link

bfinj commented Aug 1, 2020

Hi!

I was using annotate_ws.py to annotate custom questions. I ran annotate_ws.py on google cloud platform. However, I got this error:
python3 annotate_ws.py --split past,present annotating /home/Enzo/sqlova-shallow-layer/past.jsonl loading tables 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2716/2716 [00:00<00:00, 17256.43it/s] loading examples 0%| | 0/1690 [00:00<?, ?it/s]Starting server with command: java -Xmx5G -cp /home/Enzo/sqlova-shallow-layer/stanford-corenlp-4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-6f9bf1976d784f04.props -preload tokenize,ssplit,pos,lemma,ner,depparse 0%| | 0/1690 [00:40<?, ?it/s] Traceback (most recent call last): File "annotate_ws.py", line 190, in <module> a = annotate_example_ws(d, tables[d['table_id']]) File "annotate_ws.py", line 107, in annotate_example_ws _nlu_ann = annotate(example['question']) File "annotate_ws.py", line 24, in annotate for s in client.annotate(sentence): TypeError: 'Document' object is not iterable

Could you tell me why this happened? Thank you in advance!

@Daljeetka
Copy link

I am facing same issue. Did you get any solution to this problem?

@bfinj
Copy link
Author

bfinj commented Sep 29, 2020

@Daljeetka Not yet...

@Qingkongji
Copy link

When running annotate_wa.py, i got an error: ModuleNotFoundError: No module named 'stanza.nlp'. But i has installed stanza. Which package else should I install?

@Qingkongji
Copy link

i know. Change line 8 to from stanza.server import CoreNLPClient. Now i am facing the same issue TypeError: 'Document' object is not iterable too..

@gouldju1
Copy link

Try this:

import stanza
nlp = stanza.Pipeline('en')

def annotate(sentence, lower=True, nlp=nlp):
    """
    Input: Question
    Output: Tokenized input question
    {
        'gloss': original question,
        'words': list of tokens,
        'after': " " for tokens through last 2; last 2 tokens = ""
    }
    """
    doc = nlp(sentence)
    
    words, gloss, after = [], [], []
    for sentence in doc.sentences:
        for token in sentence.tokens:
            word, originalText = token.text, token.text
            after_ = " "

            words.append(word)
            gloss.append(originalText)
            after.append(after_)
        after[-2:] = ["", ""]
    if lower:
        words = [w.lower() for w in words]
    return {
        'gloss': gloss,
        'words': words,
        'after': after,
        }

@dsivakumar
Copy link

dsivakumar commented Jul 2, 2022

In case of latest stanza I have to make these changes to work (check lines with ###), started the coreNLP server outside (check this stanfordnlp/stanza#245 (comment))


#!/usr/bin/env python3
from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
from asyncio import start_server
import os
import records
import ujson as json
from stanza.server.client import CoreNLPClient ###
from tqdm import tqdm
import copy
from lib.common import count_lines, detokenize
from lib.query import Query
import stanza.server as corenlp ###

client = None
    if client is None:
        client = CoreNLPClient(annotators='tokenize,ssplit,pos,lemma,ner,depparse',
            start_server=corenlp.StartServer.DONT_START) ###
    words, gloss, after = [], [], []
    objs = client.annotate(sentence) ###
    for s in objs.sentence: ###
        for t in s.token: ###
            words.append(t.word)
            gloss.append(t.originalText)
            after.append(t.after)

@jack-jjm
Copy link

Yes, the code by @dsivakumar seems to be correct. The return value of client.annotate(sentence) is not an actual Document object, no matter what the error message says. It's something called a Protobuf, as explained (sort of) here. These objects' fields are named in the singular (sentence, token) even though they refer to iterables of multiple sentences and tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants