V3DT conference call notes for Wed, Feb 24, 1999.

The HL7 version 3 data type task group has had its eighteenth conference call on Wednesday, February 24, 1999, 4:00 EST.

Attendees were:

Agenda items were:

  1. Person Name

This time we had two participants from Australia, Klaus and Dawid. Klaus called in from HIMSS and Dawid called from Australia.

Background

The HL7 v2 person name data types (PN, XPN) have basically the same problems as the data type for addresses. I.e., it ties to make slots for data so that whatever name parts exist must be fitted in one of the available slots. This has the same disadvantages: that name part types do not classify in a simple and interchangeable way throughout all cultures, but still everyone must use the same classification. Second problem is that the meaning of a name part and the positioning of a name part are orthogonal (independent) aspects of a name. As an additional problem, person names may occur in different ordering and some name parts are or are not used depending on the use case (e.g., formal vs. familiar style).

The following references are important for informed discussions:

  1. Bidgood DW Jr, Tracy WR. In search of the name. Proc Annu Symp Comput Appl Med Care, 1993; p. 54-58.

  2. Bidgood DW Jr, Tracy WR. ANSI HISPP MSDS: COMMON DATA TYPES for harmonization of communication standards in medical informatics. Final Draft. 10/30/1993. Available as Postscript or Word.

  3. Hopkins R. Strategic short study: names and numbers as identifiers. CEN TC251. Available as PDF or Word. Note especially Appendix B: National Name Forms by Arthur Waugh, Australia.

  4. Anonymus. A Study on names in the US and in the Netherlands Available here.

  5. This conference call was based on a worksheet that summmarizes some earlier discussions.

In what follows I first present the proposed data structure for person name and then I will show examples, discuss ramifications, and justify why this design has been chosen.

Data Type for Person Name

Person Name
A collection of person name variants.
SET OF Person Name Variant

Person Name Variant

Person Name Variant
This type is not used outside of the Person Name data type. Person Names are regarded as a collection of person name variants each used in different contexts or for a different purpose.
component name type/domain optionality description
purpose Code Value optional A purpose code indicates what a given name is to be used for. Examples are: official documents, school records, ...
value LIST OF Person Name Part mandatory This contains the actual name data as a list of person name parts that may or may not have semantic tags.

Person Name Part

Person Name Part
This type is not used outside of the Person Name Variant data type. Person Name Variants are regarded as token lists. Tokens usually are character strings but may have a tag that signifies the role of the token. Typical name parts that exist in about every name are given names, and familiy names, other part types may be defined culturally.
component name type/domain optionality description
value Character String mandatory The value of a name part.
classifiers SET OF Code Value optional Classifications of a name part. One name part can fall into multiple categories, such as given name vs. familiy name and name of public records vs. nickname.

Examples

Irma Jongeneel, of HL7 the Netherlands, has many nice ramifications in her name, so we will dwell a little bit on her name. Irma has two given names "Irma" and "Corine". In her childhood her family name was "de Haas". Then Irma married Gerard Jongeneel. In Holland both spouses can choose to use either or both of their familiy names in arbitrary order. For the public records Irma chose the combination "Irma Corine Jongeneel-de Haas". But we know her by the name "Irma Jongeneel", i.e. for casual cases she assumed the family name of her spouse. But if Irma would have to show up in a court of law and her name was cited, she would be called "Irma Corine de Haas e.g. Jongeneel" where "e.g." stands for "echtgenote van" meaning "spouse of".

Let's write down the variants that we know now in the familiar instance notation.

First the name by which we know her

Irma Jongeneel

(PersonNameVariant
  (PersonNamePart :value "Irma"      
    :classifiers (SET given record))
  (PersonNamePart :value "Jongeneel"
    :classifiers (SET family record spouse)))
Just as with the address we have to take care about spacing. When the name is to be printed we usually have the name parts separated by white space. But there are notable exceptions which we will encounter in the following example.

The following is the name of her marriage record (?)

Irma Corine Jongeneel-de Haas

(PersonNameVariant
  (PersonNamePart :value "Irma"      
    :classifiers (SET given record))
  (PersonNamePart :value "Corine"      
    :classifiers (SET given record))
  (PersonNamePart :value "Jongeneel"
    :classifiers (SET family record spouse))
  (PersonNamePart :value "-"
    :classifiers (SET delimiter))
  (PersonNamePart :value "de Haas"
    :classifiers (SET family record birth)))
Note that the dash "-" is printed without leading and trainling white space. This is signified by the flag delimiter in the name classifier set. We know this flag already from the from the Address data type. Since names never have line breaks, this line break feature does not exist with delimiters in person names.

Voorvoegsel

There is a problem with the "de" that is classified as a voorvoegsel in dutch. Another very common voorvoegsel is "van" as in "van Soest". This Dutch "van" is not actually a noblety prefix, although it sounds like it used to be one. Such prefixes exist in many languages, including, French, German, and Portugese.

The problem with such prefixes is that they belong to exactly one other name part, e.g., "Haas". In Dutch the part "Haas" of "de Haas" is called the significant part of that family name, since it is significant for alphabetic sorting. Since "de" can not occur without "Haas" and "Haas" will not occur without "de" both are linked stronger than "de Haas" and "Jongeneel".

One way to handle this associativity is through nesting. With parentheses we could write "(Irma (de Haas) Jongeneel)" to show that "de" and "Haas" are associated stronger than the other parts. However, nesting is costly as it leads to significant additional complexity in the data type definition. Not that nesting is a bad idea per se, but we have to be careful since what we propose is already quite strange for many people. We will not expect a nesting depth more than one, we have to be careful.

There are other ramifications though, such as prefixes that consist of more than one part such as in French "Eduard de l'Aigle". Here "de l'" is one prefix that consists of two parts and that connects to the significant part without spacing. To make things more complex we have to realize that "de l'Aigle" is in fact a contraction of "de-la-Aigle". But we decide not to deal with this kind of lexical variations. It is probably safe to consider "de l'" as one prefix that binds strongly to the following significant name part.

Thus we could go without nesting by using special name part flags "prefix". Prefix means that this name part binds strongly to the following name part and we consider it to bind without space. Let's try how that feels:

de Haas

(PersonNameVariant
  (PersonNamePart :value "de "
    :classifiers (SET prefix))
  (PersonNamePart :value "Haas"
    :classifiers (SET family)))
Note that "de " contains a literal space. Alternatively we could define flags for prefix-with-space and prefix-no-space, but this would just make things more complex. As a rule we say that name part prefixes bind without space to the following name. If a space is required, it must be included in the name part.

Eduard de l'Aigle has a prefix that includes no space

Eduard de l'Aigle

(PersonNameVariant
  (PersonNamePart :value "Eduard"
    :classifiers (SET given))
  (PersonNamePart :value "de l'"
    :classifiers (SET prefix))
  (PersonNamePart :value "Aigle"
    :classifiers (SET family record)))

This method is challenged when we want to capture a reversed name form such as used in a phone book or in bibliographies.

Haas, de, Irma

(PersonNameVariant
  (PersonNamePart :value "Haas"
    :classifiers (SET family))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "de"
    :classifiers (SET prefix))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "Irma"
    :classifiers (SET given)))

Here we loose the string binding between to the prefix."de" and the its significant name "Haas". The prefix is postponed after the significant name "Haas", there is even an intermittent comma, and, to make things even worse, the spacing of "de" is different.

It is clear that there is no easy way out of this one. People will complain no matter what. It's a matter of finding the most elegant solution. You can always argue about elegance of course.

How's this:

Haas, de, Irma

(PersonNameVariant
  (PersonNamePart :value "Haas"
    :classifiers (SET family))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "de "
    :classifiers (SET prefix inverted))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "Irma"
    :classifiers (SET given)))
Here we say that the prefix "de " (with trailing space!) is inverted. The computer knows now that the prefix is associated with some preceeding stuff. The rule is: An inverted prefix binds to the nearest preceeding name part that is not a delimiter. Forther the rule for printing the name is: Trailing literal white space is to be removed from inverted prefixes.

For Eduard de l'Aigle this works likewise:

Aigle, de l', Eduard

(PersonNameVariant
  (PersonNamePart :value "Aigle"
    :classifiers (SET family))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "de l'"
    :classifiers (SET prefix inverted))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "Eduard"
    :classifiers (SET given)))

To completely cover all ramifications we can further undo the contraction "de l'A..." to "de la":

Aigle, de la, Eduard

(PersonNameVariant
  (PersonNamePart :value "Aigle"
    :classifiers (SET family))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "de la"
    :classifiers (SET prefix inverted))
  (PersonNamePart :value ", "
    :classifiers (SET delimiter))
  (PersonNamePart :value "Eduard"
    :classifiers (SET given)))
However, we decide not to care to allow a program to undo the inversion and redo the proper contraction. It is such a rare use case and noone would bother to care.

Echtgenote van, née, geb.

As we said earlier, when Irma shows up in a court of law, she might be called

Irma Corine de Haas e.g. Jongeneel

(PersonNameVariant
  (PersonNamePart :value "Irma"      
    :classifiers (SET given record))
  (PersonNamePart :value "Corine"      
    :classifiers (SET given record))
  (PersonNamePart :value "de "
    :classifiers (SET prefix)))
  (PersonNamePart :value "Haas"
    :classifiers (SET family record birth)))
  (PersonNamePart :value "e.g."
    :classifiers (SET prefix weak))
  (PersonNamePart :value "Jongeneel"
    :classifiers (SET family record spouse))

The "e.g." behaves pretty much like a prefix. It is not "significant" it associates with the following name part. The difference is that the association is "weak". A weak association of a prefix or suffix means that the prefix might be dropped. It is still a prefix, which means that it moves wherever the following name part moves, but a weak prefix could be omitted.

Note that a weak prefix may be followed by a (strong) prefix, such as in "Gerard Jongeneel e.g. de Haas". Not also that if a weak prefix is followed by a name part which in turn is followed by an inverted (strong) prefix, the inversion would be undone by insertion of the (strong) prefix between the weak prefix and the significant name part. Contemplate "Jongeneel, Gerard e.g. Haas, de" as an example.

In "Claudine de l'Aigle née Dubois" and "Dorothea Schadow geb. Riemer" "née" and "geb." formally behave just like the "echtgenote van", i.e. they are weak prefices. However, note that the semantics is reversed. Echntgenote van means "spouse of" while née and geborene means "born" in French and German respectively.

Claudine de l'Aigle née Dubois

(PersonNameVariant
  (PersonNamePart :value "Claudine"      
    :classifiers (SET given record))
  (PersonNamePart :value "de l'"
    :classifiers (SET prefix)))
  (PersonNamePart :value "Aigle"
    :classifiers (SET family record spouse)))
  (PersonNamePart :value "née"
    :classifiers (SET prefix weak))
  (PersonNamePart :value "Dubois"
    :classifiers (SET family record birth))
The semantic difference between née and e.g. is not important since the classification of name parts into birth vs. spouse are non-ambiguous.

Nicknames

Let's play a little bit with nicknames. I know Bob Dolin as "Bob", but at HL7 he is enrolled as "Robert Dolin" and on papers he calls himself "Robert H. Dolin". This is no bid deal, since we have three distinct name forms that we decided to threat as separate Person Name Variants without trying to relate those name parts accross the variants.

The following is the first example of a complete Person Name structure.

Bob Dolin, Robert Dolin, or Robert H. Dolin

(PersonName
  (SET   
     (PersonNameVariant
        (PersonNamePart :value "Bob"
           :classifiers (SET given nick))
        (PersonNamePart :value "Dolin"
           :classifiers (SET family)))
     (PersonNameVariant
        (PersonNamePart :value "Robert"
           :classifiers (SET given))
        (PersonNamePart :value "Dolin"
           :classifiers (SET family)))
     (PersonNameVariant
        (PersonNamePart :value "Robert"
           :classifiers (SET given))
        (PersonNamePart :value "H."
           :classifiers (SET given initial))
        (PersonNamePart :value "Dolin"
           :classifiers (SET family))))
we did not classify the person name variants here, since this would open up another can of worms. This example is not very exciting but we want to make another point.

Let's take Woody Beeler. Woody is known as "George (Woody) W. Beeler" in the HL7 membership data base. This is an interesting construct.

George (Woody) W. Beeler

(PersonNameVariant
   (PersonNamePart :value "George"
      :classifiers (SET given))
   (PersonNamePart :value " ("
      :classifiers (SET delimiter))
   (PersonNamePart :value "Woody"
      :classifiers (SET nick))
   (PersonNamePart :value ") "
      :classifiers (SET delimiter))
   (PersonNamePart :value "W."
      :classifiers (SET given initial))
   (PersonNamePart :value "Beeler"
      :classifiers (SET family)))
This would be the straight forward way to capture this example with all the features that we developed so far. However, we might want to be a bit more semantic and a bit less literal. The way Woody would say this example is probably "my name is George W. Beeler, but call me Woody." The parentheses are just a style to print the name badge. Actually the HL7 name badge looks like:
Woody
George W. Beeler
We probably do not want to introduce line breaks into the person name. Here it would be useful to do a little more semantic markup:
George (Woody) W. Beeler

(PersonNameVariant
   (PersonNamePart :value "George"
      :classifiers (SET given))
   (PersonNamePart :value "Woody"
      :classifiers (SET callme))
   (PersonNamePart :value "W."
      :classifiers (SET given initial))
   (PersonNamePart :value "Beeler"
      :classifiers (SET family)))
Two different applications could now use the same name variant to produce a name badge for an HL7 meeting and to print the HL7 membership directory. The rule for the badge application is: if there are "callme" name parts, print those in big and fat, and print all the other names below, except those names that are classified only as "callme". For the electronic membership directory the rule would be: print all names in order and use put callme-only name parts in parentheses.

Finally, for today, let's take some example where we just can't classify the names. Consider "Iketani Sahoko". Of course, if you know some Japanese you will know that Sahoko is a Japanese female and "Iketani" is her familiy name. But let's assume you don't know that :-). All you have is an unconscious girl wo has the name "Iketani Sahoko" printed (in latin letters) somewhere on her purse.

Iketani Sahoko

(PersonNameVariant
   (PersonNamePart :value "Iketani")
   (PersonNamePart :value "Sahoko"))
You now send this name without any classifier. The point is that you can not tell which one is the given name and which one is the familiy name. If you guess from the order (given name = first name) you are wrong. So, if in doubt, why being forced to guess? Of course, most data bases will force you to guess. But this wild guess can be done by the receiving HL7 interface just as well as by a unknowledgeable human. Later, when you learn more about your ptient, you can enter the correct classification:
Iketani Sahoko

(PersonNameVariant
   (PersonNamePart :value "Iketani"
     :classifiers (SET family))
   (PersonNamePart :value "Sahoko"
     :classifiers (SET given)))

Summary of Name Part Classifiers

SYMBOLSHORTDESCRIPTION
Axis 1     This is the main classifier. Only one value is allowed.
givenGGiven name (don't call it "first name" since this given names do not always come first)
familyFFamily name, this is the name that links to the genealogy. In some cultures (e.g. Eritrea) the family name of a son is the first name of his father.
prefixP A prefix has a strong association to the immediately following name part. A prefix has no implicit trailing white space (it has implicit leading white space though). Note that prefixes can be inverted.
suffixS A suffix has a strong association to the immediately preceeding name part. A prefix has no implicit leading white space (it has implicit trailing white space though). Suffices can not be inverted.
delimiterD A delimiter has no meaning other than being literally printed in this name representation. A delimiter has no implicit leading and trailing white space.
Axis 2     Marital classifiers. Only one value allowed.
spouseAThe name assumed from the partner in a marital relationship. Usually the spouse's familiy name. Note that no inference about gender can be made from the existence of spouse names.
birthBA name that a person had shortly after being born. Usually a familiy name. This is an antonym of spouse. Also known as "maiden" name, but males can switch to a spouse's name too.
Axis 3     Additional classifiers. More than one value allowed.
nickNIndicates that the name part is a nickname. Not explicitly used for prefixes and suffixes, since those inherit this flag from their associated significant name parts. Note that most nicknames are given names although it is not required.
callmeCA callme name is (usually a given name) that is preferred when a person is directly addressed.
recordRThis flag indicates that the name part is known in some official record. Usually the antonyme of nickname.
initialIIndicates that a name part is just an initial. Initials do not imply a trailing period since this would not work with non-Latin scripts. Initials may consist of more than one letter, e.g., "Ph." could stand for "Philippe" or "Th." for "Thomas".
invisible0 (zero)Indicates that a name part is not normally shown. For instance, traditional maiden names are not normally shown. Middle names may be invisible too.
weakWUsed only for prefixes and suffixes (affixes). A weak affix has a weaker association to its main name part than a genuine (strong) affix. Weak prefixes are not normally inverted. When a weak affix and a strong affix occur together, the strong affix is closer to the its associated main name part than the weak affix.

Outstanding Issues

Let's take a break here. There are lots and lots of other issues that we have to work through by making examples. It is not simple, but this is not our fault. The world is difficult.

The next issues to address is to find purpose codes for name variants. Please contemplate this issue, since I don't have any clue now.


Next conference call is next Monday, March 1, 1999, 4:00 PM EST.

Note the unusual time. This is because we will have Klaus Veil and Dawid Rowed calling in from Australia. For them it is 8:00 AM at that time.

Agenda items are:

  1. Person Name (cont'd)

For the person name there are a couple of documents and ideas that you may wish to review in order to contribute to an informed and efficient discussion.

regards

-Gunther Schadow