You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
class NameWithNickname begin
true_name ~ string_prior(1, 30) preferring all_first_names
nickname ~ string_prior(1, 30) preferring all_first_names
end
class Person begin
fname ~ NameWithNickname
lname ~ string_prior(1, 30) preferring all_last_names
end
class Record begin
person ~ Person
name ~ uniform([person.fname.true_name, person.fname.nickname])
end
The problem here is that when you first process a record, you are (by design) assumed to be observing either the person’s true first name, or their nickname. But PClean will try to initialize both latent fields. Suppose you see a person’s first name, and PClean gets it right that it’s a full first name. Then later you see their nickname in another record. You won’t be able to assign the new record to the same “person” object, because the “person” object will already have some (other, generated-from-the-prior) nickname.
If we can delay the proposal of the "other" latent until we have evidence for it, we could circumvent this issue, and do accurate inference in models like this.
This is also very relevant for data integration across multiple sources, where different sources may report different attributes.
The text was updated successfully, but these errors were encountered:
Consider the program
The problem here is that when you first process a record, you are (by design) assumed to be observing either the person’s true first name, or their nickname. But PClean will try to initialize both latent fields. Suppose you see a person’s first name, and PClean gets it right that it’s a full first name. Then later you see their nickname in another record. You won’t be able to assign the new record to the same “person” object, because the “person” object will already have some (other, generated-from-the-prior) nickname.
If we can delay the proposal of the "other" latent until we have evidence for it, we could circumvent this issue, and do accurate inference in models like this.
This is also very relevant for data integration across multiple sources, where different sources may report different attributes.
The text was updated successfully, but these errors were encountered: