Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NN dependency #122

Closed
Tpt opened this issue Feb 17, 2015 · 22 comments
Closed

NN dependency #122

Tpt opened this issue Feb 17, 2015 · 22 comments
Labels

Comments

@Tpt
Copy link
Member

Tpt commented Feb 17, 2015

Original post:
"Where is the Panama canal" is broken
Link: http://askplatyp.us/?lang=en&q=Where+is+the+Panama+canal%3F

It would be very nice to create some kind of automatized tests (maybe using log data) in order to avoid such regressions. @Ezibenroc Could you do it?


EDIT (by Ezibenroc)

The nn dependency heuristic does not work well on simple questions:

  • "Where is the Panama canal?" ("Panama" is tagged LOCATION)
  • "Who are the daughters of Louis XIV?" ("Louis" is tagged LOCATION)
@Tpt Tpt added the bug label Feb 17, 2015
@Ezibenroc
Copy link
Member

It would be very nice to create some kind of automatized tests

Already done:

@Ezibenroc
Copy link
Member

I think the issue come from #106, since there is a nn dependency between canal (undef tag) and Panama (LOCATION tag).

@Tpt
Copy link
Member Author

Tpt commented Feb 17, 2015

Very nice tests, but I was thinking more about something at the Platypus level (that would just check that we don't get a "no answer" result).

But as you have now integration tests, the priority is really lower.

@Tpt
Copy link
Member Author

Tpt commented Feb 17, 2015

I don't know if it is linked but questions like "Who are the daughters of Louis XIV?" don't work anymore

@Ezibenroc
Copy link
Member

Yes, same thing: nn dependency, and Louis is tagged LOCATION (wtf?!) whereas XIV is tagged undef.

@Ezibenroc Ezibenroc changed the title "Were is the Panama canal" is broken NN dependency and NER tag heuristic Feb 17, 2015
@Ezibenroc
Copy link
Member

The big problem is that these questions are exactly the same as "Who is the France president?".
In both cases, there is the dependency X -nn-> Y where X is tagged undef and Y is not.

Thus, with a grammatical approach, we will merge "Louis XIV" if and only if we merge "France president".


This is again our problem of "named entity recognition" (NER).

  • We could hope that the Stanford NER will become better.
  • Maybe we could improve it ourself (can it be trained?).
  • Otherwise we could do some "wikidata NER" in preprocessing: scanning the sentence, and when we see a group of words which represent a wikidata item or alias we put them into quotation marks. For instance, we would do Who is Louis XIV?Who is “Louis XIV”? and Who is the France president?Who is the “France” “president”?.

@progval
Copy link
Member

progval commented Feb 17, 2015

Is “Who is the France president?” valid English?

@Ezibenroc
Copy link
Member

Good question, I don't know. Asked on StackExchange.

@yhamoudi
Copy link
Member

Who are the daughters of Louis XIV?

If you use the latest version of the Stanford parser, both Louis and XIV are tagged LOCATION, and so we obtain the right triple.

Where is the Panama Canal?

It works if you put an uppercase letter for Canal.

Where is the Panama canal?

Not so bad, we obtain: ((Panama,canal,?),location,?). If canal was tagged correctly (ie LOCATION), it will be fine (and this is what happens when you write Canal).

Who is the France president? / Where is the Panama Canal?

All these questions are equivalent:

  • Where is the Panama canal?
  • Who is the US president?
  • Who is the United States president?

Actually we do not merge (Panama <> canal, US <> president, United States <> president).

Maybe we could improve it ourself (can it be trained?).

yes

Otherwise we could do some "wikidata NER" in preprocessing: scanning the sentence, and when we see a group of words which represent a wikidata item or alias we put them into quotation marks. For instance, we would do Who is Louis XIV? → Who is “Louis XIV”? and Who is the France president? → Who is the “France” “president”?.

yes, see #64 and #85 (and i propose to close this issue since 2 other ones are opened on the same topic)

@yhamoudi
Copy link
Member

How to update its version of the stanford parser:

@yhamoudi
Copy link
Member

See #123

@Ezibenroc
Copy link
Member

If you use the latest version of the Stanford parser, both Louis and XIV are tagged LOCATION, and so we obtain the right triple.

I have the latest version of the Stanford Parser.
In the question "Who is Louis XIV?", "Louis" and "XIV" are tagged PERSON.
In the question "Who is the daughter of Louis XIV?", "Louis" is tagged LOCATION and "XIV" is not tagged. And even is "XIV" was tagged LOCATION, we do not want to use this, since the tag is wrong.

According to StackExchange, the only correct form without using "of" is "Who is France's president?".

@yhamoudi
Copy link
Member

In the question "Who is the daughter of Louis XIV?", "Louis" is tagged LOCATION and "XIV" is not tagged.

It's strange. I add it to deep_tests, let's travis decides: https://travis-ci.org/ProjetPP/PPP-QuestionParsing-Grammatical/builds/51077581

@yhamoudi
Copy link
Member

Travis is with me :) Are you sure that you run the latest version: CORENLP="stanford-corenlp-full-2015-01-30" CORENLP_OPTIONS="-parse.flags \" -makeCopulaHead\"" python3 -m corenlp instead of CORENLP="stanford-corenlp-full-2014-08-27" CORENLP_OPTIONS="-parse.flags \" -makeCopulaHead\"" python3 -m corenlp?

@Ezibenroc
Copy link
Member

My bad, I had the two installations in conflict...

@Ezibenroc Ezibenroc changed the title NN dependency and NER tag heuristic NN dependency Feb 17, 2015
@Ezibenroc
Copy link
Member

According to StackExchange, "Who is the France president" is incorrect.

I think we whould use our previous heuristic for nn dependency: always merge.
I reopen the issue since it is no more a problem of NER.

@Ezibenroc Ezibenroc reopened this Feb 17, 2015
@yhamoudi
Copy link
Member

What about the following ones:

  • Who is the US president?
  • Who is the United States president?

We need to be sure that these questions are incorrect and will not be use in practice by the users

@yhamoudi
Copy link
Member

seems not used except for US ...

@yhamoudi
Copy link
Member

  • Who is the French president? (not an nn relation) > lemmatization is not able to convert french into france

@Ezibenroc
Copy link
Member

Same thing, you juste replaced "France" by "US" and "United States"...

We do not have to handle incorrect sentences (for the same reason, we do not have any spell-checker within our module: we suppose the input sentence to be correct).

Moreover, this sentences seems to be very odd to the native speakers, so they should not be asked very often.

@Ezibenroc
Copy link
Member

Who is the French president? (not an nn relation) > lemmatization is not able to convert french into france

This is not the subject of this issue...

@yhamoudi
Copy link
Member

Fixed 3189e90

Now we produce (Panama canal, location, ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants