-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we remove implicit order constraints on structures in v8.0? #528
Comments
First, I think more than just Some events/attributes like On the other hand many (if not all) other event/attribute tags that can have multiple instances like All tags except for I've run into this issue for some applications where I have entered multiple The same could be true for The order for any event from the GEDCOM must be maintained for all undated facts. Programs that Merge information must check the dates and order of all data events/attributes. Order is important! |
Of course order is important. The question in this issue is whether the order is implied by the order in which tags appear in the file (which is fragile as files get merged for instance), or made explicit in 8.0 such as in priority value in a substructure so that they are more robust. Today in 7.0 one must do:
to say that child I1 was firstborn and I3 was the third child and the second child is unknown. As opposed to (for example):
Consider merging the above family with one in another gedcom file that only contains one child in the file (say because it only contains the ancestors of the submitter and not siblings of ancestors) but the child is known to be the second child:
|
The two GEDCOM snippets acknowledge that each know they are missing one child, why not either create a "placeholder" child (I do this in v5.5.1 now). Have the option of adding a @void@ pointer make things easier! The bigger issue with merging is the following case (1):
Where Ralph could be anywhere in the list and no amount of ordering will help because neither submitter knows about the missing children. So you could get this case (2):
Where in your example they at least acknowledge the missing child just don't think they need to add them into the mix, either because they don't have the info or just don't care! More intervention is needed in case (2). Maybe Ralph is actually Joe's middle name and he went by that in some circles. Or Ralph died at birth and he was only known in the church book before the family moved! Or Ralph was last born! |
The examples show that explicite ordering will not help when merging data. But explicite ordering will help to keep order when other criteria do not word.
will help, if we know: Jane is born 1880, Bob is born 1885, but we do not know excatly when Joe is born. Only he is born after Jane and before Bob. However we have the possibility to use SDATE to ensure the order we would like to have: Joe gets a birth date
and the application has the data it needs to correctly order the children. Use of SDATE helps in case of merging data, too. The merged order will be as exact as the SDATEs meet the correct order. If Jane and Bob are put to the family when ist is not known that other children may be born to the parents, too, explicite ordering would give them ORDER 1 and ORDER 2 - now we merge with data which have only Joe: This child carries ORDER 1. No way for merging process to find a correct order for all three of them using ORDER! This situation I see more often than the situation that data tell: "I have only one child, but it is the second to its parents". Said this I prefer ordering criteria which help when data are merged. SDATE helps a lot! |
For children, I do like the use of For NAME, BIRT, and DEAT it seems that the notion of "primary" is important, though the rest of the ordering is much less important. Just brainstorming... perhaps just a "PRIM Y" substructure, though that would be a bit hard to deal with when merging two gedcom files with different "PRIM Y" superstructures. Another possibility would be to have something like "PRIM" at the same level with cardinality "{0:1}" so it forces a merge to be correct.
(substructures omitted just to make the main point more obvious) |
A lot of the talk recently asks the question: "What happens when a merge occurs?". I don't have a lot of experience with "full on, unassisted merging", personally I would never allow a program to merge a GEDCOM into my GEDCOM (i.e. a snippet GEDCOM into a master GEDCOM) without my intervention on all additions and removals. The problems as outlined by Dave (two PRIMARY anything, birth, death, name) should not be resolved by any program, but my the owner of the master GEDCOM. It is his/her database that is being changed by the snippet GEDCOM, not only could the PRIMARY name be incorrect, but any number of other bits of data could be wrong, or not follow the master GEDCOM's well defined data entry standard. IMHO if we think too hard about how a full on unassisted merge will cause issues, we could reject good concepts and design because we are afraid they outcome will be misinterpreted! |
As I am the admin of big databases of a genealogical association, and this assiciation uses my application for team work, I very often see loading big GEDCOM files (> 100 MB) with ten thousands of individuals to an even bigger database. Any structure in GEDCOM which needs more manually support in merging those records describing duplicate individuals will result in many hours of work, and normally will result in the option to skip those data at import. So I think about issues assisted merging will cause if we get more data which cannot be merged by program but need manually help. One example in existing standard is the number of children NCHR. There are applications in the wild which create this data by counting the children in the records pointed to by FAMS. But the application cannot see whether there is a source telling about this number of children or a application has added this without extra source. As NCHR is {1:1} I cannot show differnet versions found in different sources. So at import NCHR is ignored if there is no source citation in its substructure. If it is coming with source citation, and there already exists a NCHR with another payload and again a source citation, the user has to decide and manually enter the his solution. As in most cases there are no source citation under NCHR this will happen very seldom. This said ORDER would be one of the tags I will ignore at import when coming without own source citation. SDATE works much better, as this can be ignored when the other record of the duplicate comes with a DATE value. |
First, you should be using Second, I agree that counting the number of children connected to a family or individual is not a very good use of this tag! If I was to send this tag, it could only be created with my knowing the data is true and thus have a citation. I treat it like any other “fact” not as a calculated value! A point of interest, your merge program either must be very robust and your user base must prescreen all data collisions before merging. I seen too many “unattended merges” with lesser software and no user screening creating a mess of unreal dates, bad name recognition, and in general unusable data. Most GEDCOMs I’ve seen have either no citations or unusable ones at best (i.e. Not enough artifact source information to find the assertion again). So I suspect everyone in your group does a better job of citation building and you have a review process installed as well! |
What do we expect from the ORDER tag? Could it overwrite the DATE order? Or do we only use it when dates are the same or not given? If I have 3 children: I could write Born >1-1-2022 and <1-1-2024? That way I know the order will stay correct when merging. A note mentioning that the dates are based on the fact it is a middle child will help. When there are no dates present, this is a bit harder. In that case some ordering would help. But probably still better to have a way to tell who was born after who. So could we think of some system to tell that events happened before and after other events, if we don't know the dates of these events. |
Currently CHIL and NAMEs have an implied ordering, and other things don't.
Implicit ordering has problems with applications that do merging and transforming, which issues would not exist if ordering were explicit.
The text was updated successfully, but these errors were encountered: