V3DT conference call notes for Mon, Feb 22, 1999.

The HL7 version 3 data type task group has had its seventeenth conference call on Monday, February 22, 1999, 11:00 to 12:30 EST.

Attendees were:

Joann Larson
Stan Huff
Mark Shafarman
Greg Thomas
Mark Tucker
Robin Zimmerman
Gunther Schadow

Agenda items were:

Postal and Residential Addresses

Background

The old HL7 address data types (AD, XAD) regarded an address as a data structure where each component had a special role. For instance, AD distinguished ZIP, city, state, country, street, and other parts of the address.

Over time people discovered more information elements that could be known about an address and added those elements as components to the address data type. Those additional components where county, census tract, etc. Those information items would normally not appear on mailing labels and one would not necessarily ask for them if oue would go visit someone under a given address.

On the other hand it turned out that there are a number of information elements that do appear on mailing labels which are nevertheless rare and therefore remained unclassified. For instance, U.S. military addresses may have a unit designation "UNIT 2050" instead of a street and instead or in addition to a city. The name of a ship can appear instead of a city.

Internationally there are other address parts that may exist in one country but may be unknown in another country. For example, in U.S. addresses one finds directional codes like "N", "S", "W", and "E", which are essential to find a given address in one city. Those direction codes are unknown, for instance, in Germany.

Robin Zimmerman and Joann Larson have compiled an analysis of U.S. and some international addresses based on information of the universal postal union (UPU). This work reinforces the observation that there are so many different kinds of address parts that creating a fixed data structure where every part has its slot is impractical. See also examples of world wide addresses as published by the UPU.

Another problem with the old address data types was that they ordered the parts of an address by the meaning of that part. The most important use case for address information, however, is printing a mailing label. In order to generate a mailing label it doesn't matter what the emaning of the different parts of an address is, as long as those parts appear at the appropriate place on the label.

The placement of address parts, however, depends on the country. For example, while in U.S. and most European addresses the ZIP code appears somewhere at the end, Japanese ZIP codes are written at the very top. In fact, Japanese addreesses are writen in the reverse direction: from the most general locator tho the specific locations, with the name of the recipient appearing at the end.

Even in addresses of the north western part of the world there are such differences as to how ZIP code and city are placed. In Germany and most European countries, for instance, the ZIP code is placed in front of the city, while in England, the ZIP code appears after the city name on a separate line. In the U.S. the zip code follows the city and usually the state code. In most European countries, special country codes (different from ISO 3166 country codes) are written before the ZIP code (separated from the ZIP code by a dash). In U.S. and England country codes appear at the end. In Great Britain, however, the ZIP appears even after the country designator, whereas in the U.S.A. the country code appears at the very end.

In short, layout and meaning of address parts are independent (orthogonal) issues, but the address data type must take care of both. The focus, however, is not on the meaning of the parts, but on the layout. This is because there are too many different address parts and two many different country-specific variations. Storing addresses primarily by layout is just easier than storing them by a complete classification of address part types.

Data Type for Postal and Residential Addresses

Postal or Residential Address
This Address data type is used to communicate postal addresses and residential addresses. The main use of such data is to allow printing mail labels (postal address), or to allow a person to physically visit that address (residential address). The difference between postal and residential address is whether or not there is just a post box. The residential address is not supposed to contain other information that might be useful for finding geographic locations or doing epidemiological studies. These addresses are thus not very well suited for describing the locations of mobile visits or the "residency" of homeless people.

component name type/domain optionality description

purpose Code Value optional A purpose code indicates what a given address is to be used for. Examples are: prefered residency (used primarily for visiting), temporary (visit or mailing, but see History), preferred mailing address (used specifically for mailing), and some more specific ones, such as "birth address" (to track addresses of small children). An address without specific purpose code might be a default address useful for any purpose, but an address with a specific purpose code would be prefered for that respective purpose.

status Code Value optional The only address status known so far is "bad address", so this component could as well have been named "bad_address_ind" of type Boolean. In fact, this will turn into a boolean if we can not get hold of any other address status. It may also turn into a set of Code Values if we find out that the status codes are actually flags that are not mutually exclusive. Absence of a status means "unknown" status.

value LIST OF Address Part mandatory This contains the actual address data as a list of address parts that may or may not have semantic tags.

Postal or Residential Address
This Address data type is used to communicate postal addresses and residential addresses. The main use of such data is to allow printing mail labels (postal address), or to allow a person to physically visit that address (residential address). The difference between postal and residential address is whether or not there is just a post box. The residential address is not supposed to contain other information that might be useful for finding geographic locations or doing epidemiological studies. These addresses are thus not very well suited for describing the locations of mobile visits or the "residency" of homeless people.
component name	type/domain	optionality	description
purpose	Code Value	optional	A purpose code indicates what a given address is to be used for. Examples are: prefered residency (used primarily for visiting), temporary (visit or mailing, but see History), preferred mailing address (used specifically for mailing), and some more specific ones, such as "birth address" (to track addresses of small children). An address without specific purpose code might be a default address useful for any purpose, but an address with a specific purpose code would be prefered for that respective purpose.
status	Code Value	optional	The only address status known so far is "bad address", so this component could as well have been named "bad_address_ind" of type Boolean. In fact, this will turn into a boolean if we can not get hold of any other address status. It may also turn into a set of Code Values if we find out that the status codes are actually flags that are not mutually exclusive. Absence of a status means "unknown" status.
value	LIST OF Address Part	mandatory	This contains the actual address data as a list of address parts that may or may not have semantic tags.

Address Part

Address Part
This type is not used outside of the Address data type. Addresses are regarded as a token list. Tokens usually are character strings but may have a tag that signifies the role of the token. Typical parts that exist in about every address are ZIP code, city, country but other roles may be defined regionally, nationally, or on an enterprize level (e.g. in military addresses). Addresses are usually broken up into lines which is indicated by special line break tokens.

component name type/domain optionality description

value Character String mandatory exception: for line break tokens. The value of an address part is what is printed on a label.

role Code Value optional The role of an address part (if any) indicate whether an address part is the ZIP code, city, country, post box, etc.

Address Part
This type is not used outside of the Address data type. Addresses are regarded as a token list. Tokens usually are character strings but may have a tag that signifies the role of the token. Typical parts that exist in about every address are ZIP code, city, country but other roles may be defined regionally, nationally, or on an enterprize level (e.g. in military addresses). Addresses are usually broken up into lines which is indicated by special line break tokens.
component name	type/domain	optionality	description
value	Character String	mandatory exception: for line break tokens.	The value of an address part is what is printed on a label.
role	Code Value	optional	The role of an address part (if any) indicate whether an address part is the ZIP code, city, country, post box, etc.

Role Codes for Address Parts (first attempt)

Short	Long	Meaning
`L`	`LIT`	literal this is the default role code
`B`	`BR`	line break this does not need a value component
`W`	`WS`	white space this does not need a value component
`C`	`CNT`	country
`T`	`CTY`	city (town)
`U`	`STA`	state
`Z`	`ZIP`	ZIP code
`H`	`HNR`	house number (aka. "primary street number", however, it is not the number of the street, but the number of the house or lot alongside the street.)
`A`	`APT`	appartment number
`S`	`STR`	street name or number
`ST`	`STT`	street type (e.g. street, avenue, road, lane, ...) (probably not useful enough)
`D`	`DIR`	direction (e.g., N, S, W, E)
...

Examples

Please note that the person name is not part of our address type even though it is mentioned by UPU and Joann/Robin's list.

A U.S. address

1028 Pinewood Court
Indianapolis, IN 46240
U.S.A.

A German address

Windsteiner Weg 54A
D-14165 Berlin

If we watch these example more closely we realize that there is a problem with wite space. We would like to have a rule that there is always white space between two address parts. However, that rule does not work. If you see at the country code separator "D-14165" there is no white space before and after the dash. We might want to say that there are sertain delimiter elements like the dash "-" a solidus "/" or a comma "," that would not have white space before and after it. This, however, does not work for the comma, since there is a white space after the comma, but not before.

The solution to this problem is to define two roles similar to the literal: LIT and DEL. A LIT address part would be the value string separated from the other elements by white space. Thus

... 1001 W 10th Street RG5 ...

might become

... (AddressPart :value "1001 W 10th Street" :role "LIT") (AddressPart :value "RG5" :role "LIT") ...

A DEL address part would be a delimiter which would not be framed by white space. A comma is always followed by white space, but that white space would be part of the value part of the komma. The DEL address part does not need any value for a DEL's value is a line break by default. The German address above would thus become:

and the U.S. address would turn to

Here is the role code table again

Role Codes for Address Parts (final attempt)

Short	Long	Meaning
`L`	`LIT`	literal this is the default role code
`K`	`DEL`	delimiter stuff, printed without framing whitespace. Line break if no value component provided.
`C`	`CNT`	country
`T`	`CTY`	city (town)
`U`	`STA`	state
`Z`	`ZIP`	ZIP code
`H`	`HNR`	house number (aka. "primary street number", however, it is not the number of the street, but the number of the house or lot alongside the street.)
`A`	`APT`	appartment number
`S`	`STR`	street name or number
`ST`	`STT`	street type (e.g. street, avenue, road, lane, ...) (probably not useful enough)
`D`	`DIR`	direction (e.g., N, S, W, E)
...

Further Examples

The following is another U.S. address with maximal tagging of the address parts:

1001 W 10th Street RG5
Indianapolis, IN 46202
U.S.A.

The instance notation shows how different the new address type is compared with the old HL7 AD/XAD types.

Mark Tucker pointed out that this address type is an interesting construct: It is kind of the inverse of a record data structure. In a record, we have a bunch of slots that may or may not contain data. In this data type we have a bunch of data that may or may not be assigned slots.

XML ITS

It is especially interesting to see how this data type maps into XML. An automatic mapping (as the one used for the HIMSS demo) would create a very long unreadable XML. But the reason for the popularity of XML is that markup can be added gently to a basically "human readable" text. XML wise a much nicer represenation would be:

the contents of this address could now be refined:

note that in the above represenation we at least allowed address part roles to occur as XML attributes. If DTDs were not used, one could even create a nicer representation if we turn the role codes into XML tags.

Actually the address data type is an example for the paradigmatic use case of XML: a bunch of data that may or may not be further marked up. It would be very odd if we would not use XML in this classic way for this classic use case.

Outstanding Issues

Should we allow for address part values other than mere Character Strings? Especially, should we allow for code values? Using code values seems to make sense for things like country code and state. Using a code table for state or countries is of course safer and allows to process addresses into groups.

While this is possible in general, we have three problems:

The data type definition and all of the instances would become more complex, since we have to define the AddressPart.value as a type choice between CharacterString and CodeValue (or even ConceptDescriptor!)
While there are codes for U.S. states and countries (e.g., ISO 3166 Country Code) those codes are not used uniformly. There are two forms to abbreviate U.S. states, e.g., the Commonwealth of Massachusetts can be "MA" or "Mass.". While the ISO country code is suggested for international use, there is a long tradition in Europe to abbreviate countries in a different code (same that is used for country stickers on cars.) Thus, the ISO code for Germany is "DE" but "D" is used all over Europe.

Since there are different code tables in use one might even require the Concept Descriptor data type to account for the translations. This is a considerable overhead, for what use?
The use case of codes in addresses is very limited. If a receiver really wants to rely on those codes, we set up a number of requirements that did not exist before. (1) the address part must be tagged with an explicit role, (2) the right code must be used by the sender. The use case to code addresses is very localized, which means, the coding of address parts may be needed in one application but it is not needed in many others. In order to print labels and visit people, coded address parts are not essential.

I don't think that we should make the address data type any more complex and I don't think that HL7 should impose more requirements to code certain address parts. It just seems not to be a widely demanded use case, and I can also not see a compelling a priory argument for coded address parts (which could offset the lack of use cases).

However, there is one powerful way in which the simpler address data type defined here can meet the needs of those who would like to have coded address fields: type casting.

Through type casting a message would be valid even though the sender put a CodeValue, or ConceptDescriptor in place of a CharacterString. This means, a sender, who does code address parts, is able to send his coded address parts to a peer, who also prefers to receive coded address parts where possible. Thus, an implementation may behave as if the address data type would be defined in a more complex way.

The point is, we don't have to make the HL7 specification more difficult to understand and implement for those who do not want this extra feature of coded address parts and still allow those who want to deal with the extra work to go ahead and do it. This is another example where implicit type casting in a well defined type system proves extremely useful: the canonical specification can remain simple, and still extra requirements can be supported in a compatible way!

To use or not to use ... the boolean type is the BIG question for the bad address indicator. This is a call for proposals if we can come up with a definition and a complete code table of the address status we will go ahead and leave the definition of Address as is. If not we will pick one of two choices:

either we rename address "status" to "bad address indicator" and define it as a boolean (definition: indicates whether the address does not work.)
or, we define address as a SET OF CodeValue. The set of code values would be a set of flags that can be either on or off. One flag obviously would be "B" for "bad address" other flags may be set as use cases appear.

The second option could be a compromise so as to avoid another hour of discussion whether or not to use booleans.

Next conference call is next Wednesday, Feb 24, 1999, 4:00 PM EST.

Note the unusual time. This is because we will have Klaus Veil (and may be other internationals) joining us from HIMSS.

Agenda items are:

Person Name

For the person name there are a couple of documents and ideas that you may wish to review in order to contribute to an informed and efficient discussion. Please see the worksheet.

regards

-Gunther Schadow