The HL7 version 3 data type task group has had its third conference call on Thursday, October 22, 1998, 11 to 12:30 AM EDT.
Attendees were:
This time we had our first round of discussion about symbols, identifiers, coded elements, and the like. Although we were a bit afraid of getting into a tangle of controversial issues, we did make quite significant progress again.
CONCEPT | INSTANCE | |
---|---|---|
REAL WORLD |
Coded using mostly externally defined code systems: ICD9, ICD10, SNOMED, DSM-III, DSM-IV, ICPC, LOINC, ICPM, CPT4, etc. |
Examples: person names (old PN), organization names (old XON), locations descriptors (old AD, and PL), legal id numbers (SSN, DLN, etc.) |
TECHNICAL |
Examples: message type, order status code, participation type code, MIME media type. |
Examples: message ids, Service catalog items, RIM instances (order numbers), phone numbers, e-mail addresses, URLs |
REAL-WORLD CONCEPTS are concepts that scientists and ordinary people deal with in their mind and formulate in words (this sounds fuzzy, but that's what it is!) Communication must rely on common agreed terminology or standard code systems. Those are mostly defined by external (i.e. non-HL7) orgnanizations, such as those organizations representing domain experts in a particular medical specialty.
There is currently a lot of overlapping, competion and complementation of code systems. It does not seem as if this apparent disorganization could ever change because medicine and human life in the real world is always changing. Thus, the communication of real world concepts will always have to deal with issues of translating codes selecting the best matching "synonymous" code from different code systems.
TECHNICAL CONCEPTS are labels for well-defined concepts, such as protocols. For example: if we say "HTTP" we refer to the hypertext transfer protocol, that is an Internet standard defined quite rigorously. If we ultimately want to know what HTTP is, we can read the specification. However, most often we are not so much interested in what "HTTP" is or in what its meaning is, but we just want to use it. So we select an appropriate machinery (i.e. a web browser) and use HTTP.
With Technical Concepts there is no use for different vocabulary, no use for using both "HTTP" and "HypTexTranProt" to refer to the same technical concept. This is not to say that people could not use different names or abbreviations for HTTP, but it means that there is no point in letting everyone chose his own terminology for the exact same technical concepts.
REAL WORLD INSTANCES are individual people, organizations or things that we can meet, point at, think of, go to, etc. The strongest "definition" we can ever make is to point at those people or things, touch them or take them into hands and show them. But in documents and human communication we commonly use Names, some officially assigned Identifiers (i.e. social security number, or driver license number). Places are named using residential addresses, or other kinds of locators (e.g., building->tract,->floor->room->bed).
Things are most often pointed to (e.g. "give me this screwdriver"), or described (e.g., "give me the long screw driver ... no, the stronger one"). In larger context where we can neither point to things, nor could unambiguously describe things, we just assign arbitrary inventory numbers to the things.
In general, identifiers for Real World Instances are quite rich of intricacies and we will address those later. The common approach for data types is already laid out by HL7 v2.x: i.e. PN, XON, DLN, AD, PL, and the like.
TECHNICAL INSTANCES are instances that are useful in some technical sense. Just like with Technical Concepts we are less interested to know what exactly those instances are. Rather, the reason why we name technical instances is because we want to use them. In case of HL7 most of those technical instances will be particular data instances, such as messages, order numbers, service catalog items, or any other instance of a RIM class that we can refer to.
But Technical Instances are also things like telephone numbers and e-mail addresses or Uniform Resource Locators (URL) to Web pages, images, or chat rooms. The general idea is that what you do with a phone number is rarely to search the phone book in order to find the address of where a given telephone is to meat some person. This would be to find out what a given telephone number means. In most cases, we choose to directly use those telephone numbers by simply picking up the next phone and dial that number.
The same is true for database records or data instances on computer systems, we do not go and analyze memory dumps of computer systems in order to find out what a given Technical Instance really is, we just use them in some machinery that, for instance, lets us query for a given record entry, lets us change that record entry.
For the rest of the conference we concentrated on Concepts, both technical and of the real world, and on technical instances.
Through narrowing down namespaces we can achieve uniqueness of identifiers quite easily. This is for example why in computer programming local variables in procedures are safer than using global variables. The real important quality of uniqueness is that identifiers are globally unique. Global uniqueness is generally achieved by a structure defined in the following piece of BNF:
Obviously this is a recursive structure, i.e. every namespace is itself identified by a name in its parent namespace. This recursion up the namespace hierarchy must somehow be terminated. This is done by assigning one globally unique namespace, where names are valid without the reference to another namespace.
<identifier> ::= <name> <namespace> <namespace> ::= <identifier>
The uniqueness of an identifier does not imply, however, that a given instance could not have several names. Thus if you compare unique identifiers literally and you find that they do not match, you know nothing. Both identifiers can still refer to the same instance.
An identifier is "unravelable" if we can analyze its pieces, and for each piece, we can find someone to talk to.
Internet domain names (DNS) are unravelable expressions. For
example we can unravel the string "falcon.iupui.edu
" from
the right, where "edu
" is maintained by Internic (the
organization that assigns top level Internet domains). When the
Indiana University Purdue University Indianapolis (IUPUI)
registered its domain name "iupui
" with the Internic,
they had to name an official person who is responsible for
"iupui
". That person knows what "falcon
"
is.
ISO Object Identifiers (OID)1 are unravelable too. ISO OIDs are unraveled
from the left. For example,
"1.2.840.10008.421292.87828.333433.001
" stands for
ISO (1) ISO member body (2) USA (840) DICOM Standard (10008) AGFA
(421292) ... The left most numbers are registered with gigantic
organizations. Eventually, a company like AGFA gets a number
allocated, say, 421292. It then creates machines where one of the
machines has the number 87828. That machine allocate numbers to an
imaging study (333433), that contains a series of images (001).
In unraveling an ISO OID we walk the path down basically the same way as with DNS names. DICOM has registered people with in the US member body of ISO (ANSI). AGFA has registered people to DICOM. They, or someone in the radiology department, could probably tell you that 87828 is the CT machine in the trauma center. Finally, the machine itself allocates identifiers at "computer speeds" to things like studies and images.
HL7 filler orders are somewhat unravelable. For example, you are
given the filler order "1234^OUTPATIENT.LAB
". If you
could figure out what department the symbol
"OUTPATIENT.LAB
" referred to, then you could call them up,
and ask them about item "1234
".
As we can see, the quality that an identifier is unravelable is a result of the way the namespaces are managed. Both ISO OIDs and Internet domain names are organized through hierarchical namespaces.
An identifier is "dereferenceable" if there is a machinery that resolves those identifier for you rather than require you to go the rather painful way of unraveling. For Internet domain names there is such a machinery dedicated to resolve names. I.e. the domain name service (DNS). The Internet name server next to you will resolve the address for you quite seamlessly. There is a whole infrastructure of domain name services, which is why it takes so long to get an answer from a DNS server if you typed in a wrong domain name: your DNS server asks another server that asks another server and so on.
For ISO OIDs there is no such an easy way of dereferencing. In some cases there may be catalog services that resolve a subspace of the whole gigantic OID namespace.
A telephone number is a perfectly unique and dereferenceable identifier
if we start at the root of the namespace provided by the global
telephone system. Fax numbers are usually written in a standardized
way, where for instance "+49308153355
" used to be my old
fax and phone number in Germany, while "+13176307960
" is
my office phone number in U.S. All you need to do to dereference such
a phone number is to pick up your phone, dial the prefix for
international codes ("+
"), dial the other digits and be
done with it.
Unified Resource Locators (URL) are another example of dereferenceable identifiers. For instance,
"http://aurora.rg.iupui.edu/~schadow/v3dt
"
is our V3DT project homepage. Your browser and the Internet does
everything for you after you typed in this URL. URLs start with naming
the protocol to use, the rest of the URL is a literal that the
protocol is supposed to understand. For example, I can watch the same
homepage as a local file using the URL
"file:/home/schadow/public_html/v3dt/index.html
"
In general for an identifier to be dereferenceable it need not be practically un-ravelable. For instance, a telephone number is for all everyday purpose not unravelable (only law enforcement is given this privilege). You may be able to figure out a country code (1 for U.S.) and an area code (317 for Indianapolis), but you will have a pretty hard time to find the number 6307960 in the phonebook of Indianapolis.
The important point about dereferencing identifiers is that you do not get down to their "meaning" in the real 3D world through the process of dereferencing. I.e. unless you come into my office, you will never see my machine, "Aurora", featuring the above homepage. And the machinery that dereferences URLs seamlessly does not bring you into my office. All you can do is looking at what the Internet/HTTP/Browser machinery brings to your screen as a result of dereferencing the URL identifier. Likewise with the telephone you can call me, but you cannot creep through the wire to see my telephone.
I do not remember that we brought this point to closure. There are some concrete propositions that where more or less implicit in our discussion but that we where probably not prepared enough to spell out clearly.
Proposition 1:
HL7 identifiers for technical instances are to be unique.For identifiers to be unique we have to manage the global namespace. Most importantly every identifier must be explicitly linked to the root of the namespace hierarchy.
Since HL7 has ackquired a branch in the tree of ISO OIDs we are free to use OIDs in a similar way as DICOM uses OIDs heavily and directly.
Many existing HL7 systems do not assign purely numerical identifiers for the technical instances in their realm. For instance they may use alphanumeric keys into any data file. We might not want to force people to adopt a pure OID scheme for identifiers.
We can, however, assign OIDs to everyone who writes applications for HL7 and everyone who maintains HL7 communications. On that basis people were free to use attach their own naming scheme to their standard OID. If they want, they may use OIDs in their realm, but they may also use freeform identifiers.
Thus, HL7 identifiers for technical instances could be defined as pairs of OID and a Character String to be used for locally defined codes. In particular the HL7 standard would not allow identifiers to be sent without the OID.
Proposition 2:
HL7 identifiers for technical instances should be unravelable if they are not dereferencable.
This proposition is solved if we pursue the above described data type that uses an OID and an optional freeform identifier that is meaningful only in the namespace designated bu the OID and that may never be communicated in HL7 without the OID.
There are issue however:
This need not be outsourced, the HL7 HQ could do this as a service to its members and for a nominal registration fee for non-members. We can learn from the DIOCM and Internic experience of how easily this is done.
Proposition 3:
If HL7 identifiers for technical instances are meant to be dereferencable they should be declared as such and the machinery should be specified that is needed to do the job.It almost appears as if we want to have two different data types for technical instances:
This data type could for simplicity be constructed of only two components: the required object identifier, an ISO OID and an optional extension of type Character String.
< "would refer to the technical concept of an HTML media type, whiletext/html
", "MIME-TYPE
">
< "would refer to the real world concept of "headache" as defined by ICD9 (i.e., in ICD9 would not include the concept of "tension headache", 307.81).784.0
", "ICD9 CM
">
The exact structure has more parts:
Note that the definition of this data type has further evolved: [version 2] [version 3]
component name | type/domain | optionality | description |
---|---|---|---|
value | Character String | required | this is the plain symbol, like "784.0 " |
code system | a code by itself | required, can be fixed by context | denotes the code system that defined the plain symbol |
code system version | Character String | conditional | a version descriptor defined specifically for the given code system |
print name | Character String | optional | a sensible name for the code as a curtesy to an interpreter of the message. THE PRINTNAME BEARS NO MEANING, it can never be sent alone and it can never alternate the meaning of the code value |
code system
obviously is by itself a
technical concept identifier. If we are going to use the
data type Coded Value for concept identifiers, we have a
recursive type definition. Not that recursion is bad in general, but
the question is: what terminates the recursion?
If HL7 maintains a list of coding schemes and defines symbols for
any one of those schemes, one could be tempted circumvent this problem
of recursion by defining the component named code system
as a simple Character String. However, we should be
prudent here: what happens if HL7 outsources its code of coding
systems? What happens if there are multiple codes of coding systems
(e.g. suppose the CEN coding system registry standard becomes an ISO
norm?)
code system version
is used as a
refinement of the code system
descriptor. Logically, any
version information it is useful only together with the code system
identifier. We would usually reflect this in a nested structure such
as
<value <system, version> print-name>.Stan Huff did not want this kind of nesting, Mark Tucker and I think that we should not be worried about nesting in any way. However, we do not want this to be a controversial issue, so that we agree into flattening the structure here. It is quite an exceptional situation anyway.
ICD
" the name and
"9
" or "10
" the version? If so, what about
the derivatives of ICD-9 (e.g., ICD-9-CM) and ICD-10 (e.g.,
ICD-9-PCS)? What about the minor versions where a few codes are taken
out or brought in every now and then? If we define all coding systems
in a special HL7-maintained table, why do we not just define new
symbols for every new major and minor version coming out? How can we
assure that the stuff people will put into the version component is
standardized and interoperably useful?
HL7 would still have to make sure that the true version identifier
of LOINC 1.0j is either of "1.0J
," "1.0j
,"
"1.0-J
," "1.0 j
," but not just any of
those. While the organization who maintains a code system will have
their own version numbering scheme, they will not define unambiguous
exact string representations for their revision. And we can not expect
them to do that. So we have to maintain a list of the versions or a
set of clearly defined rules on how the version identifying string is
formed.
Traditionally, HL7 defined the letter "L
" to stand for
any local system, or, if more than one local code system exists at a
given site, to name those "99
zzz" where z
would be a digit. We can loosen this constraint a little bit by saying
that every code system name starting with "99
" be local.
Stan Huff and everyone else agreed that the old CE data type and its interim proposed successors (with various names LCE/CWE and CE/CNE) was basically one pair of Code Values defined above plus a free text string that could be used to convey the original text in an uncoded fashion.
Neither Stan Huff nor anyone else objected that the new data type for real world concepts could be defined as a general collection of Code Values with one, two, or more codes.
We agreed that there is an important difference to make for the semantics of a collection of Code Values. Two those semantic flavors exist:
We recognize that both flavors of collections of code values will have to be supported by the new data type for real world concepts. An example from HL7 v2.x is the "specimen source code" in the OBR-Segment, which was such a conglomerate of quasi-synonyms and modifiers.
We are not afraid to define the new data type for real world concepts as a rich nested structure, as long as we are very specific about the meaning of such a structure.
Stan Huff wants to see the new data type for real world concepts keep track of the systems which perform translations on those codes. Thus every code value could be annotated by whom, when and how a particular quasi-synonymic code value was added to the collection of quasi-synonyms.
I want to make sure that the new data type for real world concepts keeps track of the order in which translations where performed and on the quality of those translations.
Stan Huff, and Mark Shafarman do want to see clearly how the "exception handling" would be dealt with. The distinction Code without exceptions" and "Code with exceptions" was proposed before and we should make sure that we capture the requirements that this proposal tries to address. Stan Huff also mentioned that he recognizes the general applicability of those "exception handling" mechanisms to other HL7 data fields that are not declared to be of this coded data type.
We did not yet discuss on anything more specific.
There was a pending notion that the data type for technical concepts could just be the Code Value although this was not confirmed explicitly.
Almost everything discussed above is an open issue, except for the things listed under Resolution.
Some specific open issues on the Coded Value are listed above. We will have to check the proposed solution for consensus, or we will have to negotiate other solutions.
Thank you and regards,
-Gunther Schadow