Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve datamodel #48

Merged
merged 2 commits into from
Jan 30, 2015
Merged

Improve datamodel #48

merged 2 commits into from
Jan 30, 2015

Conversation

yhamoudi
Copy link
Member

  • Some minor modifications
  • An important change: (la, lb, lc) is true iff for all ala, there exists blb such that for all clc, (a, b, c) is true. If we don't do this, we cannot use alternative words into triples (otherwise full triples will be almost always false...)

@Ezibenroc
Copy link
Member

👍

@Tpt
Copy link
Member

Tpt commented Jan 28, 2015

+1 for misspelling fixes.

After reflexion, I'm also ok for the change from forall to exists of predicate quantifier because it make sense, imho, to say that (A, B, C) is true if, and only if, forall a, c there is a relation between a and c in B.

@s-i-newton Your opinion?

@yhamoudi
Copy link
Member Author

(A, B, C) is true if, and only if, forall a, c there is a relation between a and c in B.

it's not exactly what i mean:

  • my proposition: (A,B,C) true <=> ∀ a∈A, ∃ b∈B, ∀ c∈C, (a,b,c) true
  • yours: (A,B,C) true <=> ∀ a∈A, ∀ c∈C, ∃ b∈B, (a,b,c) true

i don't know what is the best

@Ezibenroc
Copy link
Member

([France],[president, capital], [François Hollande, Paris]) is false with the @yhamoudi proposition, and true with the @Tpt proposition.
@yhamoudi proposition seems to be better. But maybe there is also counter-examples for it.

@yhamoudi
Copy link
Member Author

If we consider that lists of predicates are only multiple attempts to nounify a verb, we should say that only one of them is supposed to be true for all triples: (A,B,C) true <=> ∃ b∈B, ∀ a∈A, ∀ c∈C, (a,b,c) true (ie: there exists a same link between all the elements of A and C).

In the same way, we could change:

  • actually : (A,B,?) = {c / ∃ a∈A, ∃ b∈B, (a,b,c) true}
  • change : D = { b∈B / ∀ a∈A, ∃ c∈C, (a,b,c)} and then (A,B,?) = {c / ∃ a∈A, ∃ d∈D, (a,d,c) true}

And:

  • actually : (?,B,C) = {a / ∃ b∈B, ∃ c∈C, (a,b,c) true}
  • change : (?,B,C) = biggest set E such that ∃ b∈B, ∀ e∈E, ∃ c∈C, (e,b,c) true

@Tpt
Copy link
Member

Tpt commented Jan 28, 2015

@yhamoudi
"If we consider that lists of predicates are only multiple attempt to nounify a verb": we really should avoid to be so much related to an algorithm in order to be as generic and stable as possible. It isn't an acceptable semantic for the predicate parameter of the triple node to say "it's the result of the nounification of a verb": we can't assume that every code that will output questions encoded in the data model will be a NLP tool.

Your change will really increase the complexity of the semantic of triple with holes and I don't think it's a good idea: why should we prefer the predicate that gives the most of results against the others? Maybe an other predicate that gives less but more accurate results was the good one. So, this distinction is maybe not the right one.

About the change of full triples, I liked the 3 "for all" because it shows that the three arguments (subject, predicate and object) where more or less "symmetric". But, as there is a real use case I'm ok to change its definition if we find an other one that has nice and easy to understand semantic.

In fact I think all the debate is around the question "do we have nice and accurate predicates" (current assumption of the data model) or 'crappy words" (way you want to take because the nounification algorithms can't do better). It's a nearly fundamental change of mind. So, I think this pull request opens, in fact, a far bigger question that just changing the semantic of a node.

@Ezibenroc
"proposition seems to be better" -> I'm not sure that a single example is a good argument to prefer a definition against an other. I would be very happy to agree with your point of view if you give a such semantic.

@yhamoudi
Copy link
Member Author

"If we consider that lists of predicates are only multiple attempt to nounify a verb": we really should avoid to be so much related to an algorithm in order to be as generic and stable as possible. It isn't an acceptable semantic for the predicate parameter of the triple node to say "it's the result of the nounification of a verb": we can't assume that every code that will output questions encoded in the data model will be a NLP tool.

It's relevant to have lists for subjects or objects. But is it natural/necessary to have a list for predicates (when predicates are not seen as "alternatives) ? If I want the birth date + the birth place of Obama, I split the questions in 2 parts (same problem than before, should we handle Who is the president of France and the capital of China?) or I use the normal form (Obama,birth place,?) ∪ (Obama, birth date, ?).

Predicates are (almost) the only part of the normal form where new words can appear (ie words that are not in the initial question). All the algorithms that produce normal forms from questions need more "freedom" on these kind of nodes because they have to guess what they should add. Other modules that output questions and want to have several predicates in a node just have to split the triple with ∪.

In fact I think all the debate is around the question "do we have nice and accurate predicates" (current assumption of the data model) or 'crappy words" (way you want to take because the nounification algorithms can't do better). It's a nearly fundamental change of mind. So, I think this pull request opens, in fact, a far bigger question that just changing the semantic of a node.

If "nice and accurate predicates" means to have exactly the right predicate, it seems very difficult (What does a platypus eat? > eat = diet, What can you eat at a fast-food restaurant? > eat = food). We do not restrict the expressiveness of the datamodel if we consider list of predicates as alternatives (just use ∪ if you want (Obama,[birth place, birth date],?)), so where is the problem?

Your change will really increase the complexity of the semantic of triple with holes and I don't think it's a good idea: why should we prefer the predicate that gives the most of results against the others? Maybe an other predicate that gives less but more accurate results was the good one. So, this distinction is maybe not the right one.

I don't know if it's relevant or not to change also the way we consider triples with hole. But we must have the same policy concerning triples with and without holes. If we agree that predicates are lists of alternatives for full triples, then it must also apply for triples with hole (and eventually change the way we evaluate them, because at least one triple is supposed to exists for all the subjects)


Another possibility is to allow only one predicate (per triple) into the datamodel and to introduce the notion of "alternatives predicates" only into the implementation.

@Tpt
Copy link
Member

Tpt commented Jan 29, 2015

"Who is the president of France and the capital of China?": Yes we should definitively handle this kind of questions. And it's already done with the clean normal form "(France, president, ?) ∪ (China, capital, ?)". I don't see what is the link with the current problem.

Ok. I buy your arguments for relaxing predicates and I'm ok to change the full triple specification.

Now, two options:

  1. Define full triple as (A, B, C) <=> ∀ a∈A, ∃ b∈B, ∀ c∈C, (a,b,c) (your proposal). It requires to change the definition of triples with hole and, so, requires to assumes that in B there is a "good" predicate (the b chosen) and that other that are "bad" ones.
  2. Define full triple as (A,B,C) <=> ∀ a∈A, ∀ c∈C, ∃ b∈B (a,b,c) (mine) and, so don't change the semantic of triples with hole.

I prefer option 2 because:

  1. For triple with hole what is the "b" we should choose? The one with the more important number of results for the current module? What if the module x choose a "b" and the module y an other? Should the core keep all results or choose a "b" and removes the results from the other "b"? The option 2 do not create this issue as all "b" are equals.
  2. It does not change triple with hole definition and so is a less disruptive change.

@yhamoudi
Copy link
Member Author

"Who is the president of France and the capital of China?": Yes we should definitively handle this kind of questions. And it's already done with the clean normal form "(France, president, ?) ∪ (China, capital, ?)". I don't see what is the link with the current problem.

Related to ProjetPP/PPP-QuestionParsing-Grammatical#73. I don't think that representing this kind of questions with a list of predicates that are totally differents is a nice way to do. It was an argument to define lists of predicates as lists of alternatives.

Concerning the way we define full triples, none of the 2 options totally convince me, so I agree to choose the 2nd one because it has the least impact.

(the datamodel has been updated, those that don't agree must speak or it will be merged as it)

@Ezibenroc
Copy link
Member

There is an asymmetry between full triples and triples with holes that I find strange.
If you take lc=(la, lb, ?) then (la,lb,lc) is not necessarily true.

@Tpt
Copy link
Member

Tpt commented Jan 29, 2015

+1 for the current version. I think we should wait an agreement from @s-i-newton and @progval before merge.

@Tpt
Copy link
Member

Tpt commented Jan 29, 2015

@Ezibenroc It's not a new problem. We should maybe relax the definition of full triple with 3 exists in order to have a symmetry... But it's an other topic

@Ezibenroc
Copy link
Member

It's not a new problem. We should maybe relax the definition of full triple with 3 exists in order to have a symmetry... But it's an other topic

Alright. +1 for the merge.

@progval
Copy link
Member

progval commented Jan 29, 2015

👍

@marc-chevalier
Copy link
Member

It's consistant and... all is alright.

Tpt added a commit that referenced this pull request Jan 30, 2015
@Tpt Tpt merged commit c86c439 into master Jan 30, 2015
@Tpt Tpt deleted the restrict_full_triple branch January 30, 2015 09:39
@Tpt
Copy link
Member

Tpt commented Jan 30, 2015

Yeah! \o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants