GEDCOM 1x1


by Jan McClintock

As genealogists use it, the GEDCOM format is famous as a great way to share data, but is infamous for being finicky. In this article I'll attempt to explain the basic principles and formats of a GEDCOM file. For more details (many more), you can download the actual GEDCOM v5.5 documentation from ftp://gedcom.org/pub/genealogy/gedcom/. There you will find the many intricacies and intriguing potential of this format.

What is GEDCOM?
GEDCOM is an acronym for Genealogical Data Communication; it's a method of formatting the text of your family data so that different software programs and operating systems can read and understand it.

The GEDCOM "standard" was originally developed by the Church of Jesus Christ of Latter-day Saints' Family History Department, which holds the copyright. Although it began as a "universal" format, many genealogy programs implement their own, slightly-modified version of GEDCOM. To you, the user, this means that it isn't always a sure thing that all of your data makes it from one program to another (more about this later).

What is a GEDCOM File?
A GEDCOM file is a series of text lines (usually using the ASCII character set) each holding a specific piece of data relevant to your family file. The lines are numbered to show the hierarchy among the data, and are tagged to show the type of data.

It's quite possible to construct a GEDCOM file manually (using a word processor), but the process could be painstaking. For instance, the GEDCOM of one of my family files in Reunion including only 405 individuals (288 families) ended up at 5306 lines! Reunion will create a GEDCOM file for you, properly formatted and ready to be shared and used by others. Details and instructions are in Chapter 39 in the Reunion manual.

Parts of a GEDCOM File
If you created a GEDCOM file and opened it using your word processor, you would see line after line of numbers, abbreviations, and bits and pieces of data. There are no blank lines and no indentations in a GEDCOM file. Although it might look confusing, there is method to the madness.

Groups of lines that hold information about one individual (INDI), one source (SOUR) or one family (FAM) are considered records, and each line in a record has a level number. The first line of every record is numbered "0" (zero), to show that it is the beginning of a new record. Within that record, different level numbers pertain to the next nearest level above it; the lines following the 0 level all pertain to that record, until the next 0 level is reached.

Think of it like an outline format, with level numbers instead of Roman numerals, etc. For example, here is a simplified record of an individual with some explanations in parenthesis:

0 NAME Joseph /PRYOR/
1 SEX M
(data about the record in level 0 above, which is Joseph)
1 BIRT (more data about Joseph, this time that he had a birth...)
2 DATE 13 FEB 1922 (details about the birth in level 1 above)
2 PLAC Monroeville, Cass Co, MI (more details about the birth)
[This record is about Joseph PRYOR until the next level 0 is reached, signalling the start of a new record:]
0 NAME Martha /WHITE/ (new record about another individual)
1 SEX F (data about Martha)
1 BIRT (etc.)

After the level number, these lines also contain a Tag, which is an abbreviation of the type of data in that line. Most tags are obvious; HUSB for husband, PLAC for place, MARR for marriage, etc., but some are more unique, like EMIG for Emigration, and HMOT for Husband of Mother. These tags can also be Pointers (@S43@), which indicate another individual, family, or source within the same GEDCOM file .

In Reunion's Gedcom Export window, you can choose which tags you want to be assigned to your outgoing fields. However, be aware that not all genealogy programs will recognize your chosen tags, especially if you create one of your own. This could cause the data listed under this tag to be ignored or lost.

When importing a GEDCOM file, a genealogy program uses these level numbers and tags to assemble the data into a family, with relationships intact. The software reads the line numbers and the tags and tries to place the data into the correct fields. If the software doesn't recognize a tag, it either ignores that line or places it in a specific field from which you can move it later. When importing a GEDCOM file to Reunion, you have a choice of either discarding this unrecognized data or placing it into a Note field.

Here is an example of a GEDCOM file, taken from my own family file in Reunion. The DESTination is Ancestral File, and the CHARacter set is MS-DOS.

This first record (a section of the overall file) is the HEADer record (starting with line 0 and going until the next line 0 is detected). The Header contains introductory information, such as the SOURce program used to create the GEDCOM file, and which VERSion of the program and of GEDCOM was used, etc.:

0 HEAD
1 SOUR REUNION
(the creating software)
2 VERS V4.0 (the version of Reunion)
2 CORP Leister Productions
1 DEST ANSTFILE
(the receiving/importing software)
1 DATE 6 FEB 1996 (the creation date of the GEDCOM file)
1 FILE Paternal Family File 4.0 (the name of the file on my hard disk)
1 GEDC
2 VERS 5.01 (the version of GEDCOM that was used)
1 CHAR IBM DOS (the character set of the receiving software)

The next section is a record for an INDIvidual, Michael Fitzgerald. He is Individual number 302.

0 @I302@ INDI
1 NAME Michael /FITZGERALD/
1 SEX M
1 BIRT
2 DATE 1842
2 PLAC Duncannon, Ireland
1 DEAT
2 DATE 3 JAN 1916
1 OCCU Theater Stage Hand
1 NOTE Emigrated from Ireland in 1860, prob. landed at the Battery, New York City. Had a brother who emigrated later and settled in Boston. Michael worked for the old Forepaugh's Theatre as a stage hand and collected autographs and news clippings of
2 CONT stage stars; spoke with a thick Irish brogue and known for his wit.
1 SOUR @S2@
1 HEAL Died of le grippe.
1 FAMS @F107@


[The CONTinue tag indicates the the NOTE field has more text than will fit under one tag, so it is continued on the next level, 2]
[The SOURce line shows where the information directly above it was found; this is pointing to Source number 2; see below]
[The FAMS @F107@ is a pointer to a FAMily record; the S on FAMS indicates that the family record is for this individual and his/her Spouse, who make up Family number 107; see below]

Later in the GEDCOM file, the SOURces are listed numerically. The following is the record for Source number 2:

0 @S2@ SOUR
1 NAME "The History of the Fitzgerald Family", compiled by Eileen DeHope and Elizabeth DeHope Boyce, 1978


Still later in the GEDCOM file, the FAMilies are listed numerically. This Family, number 107, is made up of a HUSBand, whose Individual number is 302, and a WIFE, whose Individual number is 303, and their CHILdren, whose Individual numbers are listed. Also included is the couple's MARRiage DATE:

0 @F107@ FAM
1 HUSB @I302@
1 WIFE @I303@
1 CHIL @I38@
1 CHIL @I316@
1 CHIL @I111@
1 CHIL @I109@
1 CHIL @I318@
1 CHIL @I36@
1 CHIL @I321@
1 MARR
2 DATE 17 DEC 1862


Remember that in a complete GEDCOM file, many records are included. All of the lines would be listed one after the other, usually with individuals listed first, then sources, and then families. The final record that is required in a GEDCOM file is called a Trailer, and notifies the software that this is the end of the file:
0 TRLR


nach oben