Skip to content
yehudad edited this page Feb 5, 2012 · 28 revisions

Introduction

Genealogy work is complicated. Fortunately, we have computers to help us manage complex work. But that puts a significant responsibility on software developers to create the right tools to get the job done well.

In the past, genealogical software was primarily used to manage somebody's conclusions about their genealogy. In order to share those conclusions with their family or to transport their conclusions to a new computer, the conclusions had to be saved to a disk. GEDCOM was the name of the standard way to save those files to disk.

The world has shifted. Computers are being used much more broadly across all aspects of the genealogical research process. With the arrival of the Internet and the World Wide Web, genealogists are using computers to:

  • Make records available online as digital artifacts
  • Extract and annotate online artifacts so as to make them searchable
  • Search for records and other genealogical information
  • Make conclusions based on sound evidence found in records
  • Support conclusions by accurately citing the sources of the evidence
  • Identify contradictory evidence and alternate theories
  • Share and collaborate on genealogy work

GEDCOM X is the industry standard for facilitating these activities.

The Conceptual Model

GEDCOM X Conceptual Model

Consider the depiction above, illustrating how GEDCOM X can be used to facilitate genealogy work.

Sources and Records

Information on an ancestor can be gathered from multiple sources. The depiction above shows three (of the many) possibilities: a book, a photo, and a census. Note the important distinction between the real, physical manifestation of the sources and their digital representation. The GEDCOM X domain refers to the former as physical artifacts and the latter as digital artifacts.

Both physical artifacts and digital artifacts can be cited as sources for genealogical data. And sources can, in turn, cite other sources as the "source of the source". A JPEG image of a birth certificate, for example, can cite the certificate itself (the physical artifact) as its source. The concept of the "source of the source" is important to measuring the validity and accuracy of genealogical data.

GEDCOMX specifies a model for describing a source in terms of its metadata and for transcribing the contents of the source.

The metadata of the source is data "about" the source. Metadata includes things like the title, publisher, publication date, author, and (especially important to genealogical research) the bibliographic citation for the source. GEDCOM X uses the terms defined by the Dublin Core Metadata Initiative to define standard source metadata.

The process of making the contents of the source available is called extraction. For example, if the source is a census that lists a John Smith born January 1, 1880 then the result of extracting the source will be a piece of structured digital data called a record that specifies a persona with a name "John Smith" and a birth fact with "January 1, 1880" as the text of the date. Technically, the record is also data "about" the source (and hence can be considered source "metadata") but it's useful to distinguish it as a special case because it is separate from the Dublin Core terms and has particular significance to genealogical applications.

The brown elements in the illustration above represent the record data.

Conclusions

Genealogical conclusions should be based on sound evidence supported by properly cited sources. The source metadata can be used to properly cite the evidence for conclusion data and to support the genealogical proof standard. The record data can be used to supply conclusion data and to measure its validity. The conclusion data is represented in blue above.

Models and Profiles

GEDCOM X is neatly partitioned in such a way so as to allow developers to easily use the pieces they need without having to swallow the entirety of the specification. The data is divided into different Data Models that define the genealogical data types and their properties.

But the GEDCOM X specification defines not only the data models used to describe genealogical data, but it also defines a set of APIs that describe standard operations on genealogical resources. The API specifications are divided into different Application Profiles that are intended to address specific sets of well-defined requirements and use cases.

To read about the different GEDCOM X data models, see the data model documentation.

To read about the application profiles, try starting with the developer's guide.

Getting Started

If you're new here, try the Self Guided Tour of the project, which is designed to help you feel comfortable participating in the project. If you'd rather not go through a tour, the Community page pretty much summarizes it.

To learn how to produce and consume GEDCOM X from your application, take a look at the Developers Guide.

To read the documentation on the GEDCOM X domain and its different models, try starting at Data Models.

To consider how to migrate to the new standard, start at Migration Paths.

Clone this wiki locally