The HL7 version 3 data type task group has had its thirteenth conference call on Monday, January 11, 1999, 11:00 to 12:30 EST.
Attendees were:
Agenda items were:
Our time runs out. We have 10 more work days to go until the HL7 meeting in Orlando. Including today's items we have been able to walk once through allmost all the major issues. However, there are lots of detail issues that we have deferred or not even mentioned yet (e.g., week days, ordinal data, what do we end up doing with currency.) We did not make the mapping between all old types to new types, and we wouldn't be able to do so at this point, since we are still lacking major data types, i.e. the revised person name and address type, the real world instance identifier, and probably more.
Any way, today we are closing up the list of issues. For the rest of the time, we will concentrate on the presentation of our work to the Control/Query TC as well as to the TSC.
The CQ meeting on data types will be held on Wednesday AM. It will give a quite detail report on what we did. We'll likly need the whole morning session, however, with a definite end at noon, so that we won't continue significant parts in the afternoon.
That presentation will be held by several people. Including Joann and/or Mike of Kaiser, Stan Huff, probably Mark Tucker, and myself.
Mark Shafarman will schedule the presentation for the TSC meeting. That would be either Monday (retreat) or Tuesday (meeting) night and would last just 15 minutes.
It doesn't make much sense to divide up that time to multiple speakers, so, I would do that one; except, if Mark Shafarman want's to talk as well.
Boolean | |||
---|---|---|---|
The boolean type stands for the values of two-valued logic. A boolean value can be either true or false. | |||
PRIMITIVE TYPE |
Although the Boolean data type seems like a detail issue, it is quite important. A Boolean value can either be true or false. While Boolean values are the very basic values of all digital information processing machinery, boolean data type is useful even in the highest sphere of abstraction data analysis.
Use cases for the Boolean type are all RIM attribute with the
"attribute type" suffix "_ind
" (indicators).
HL7 v2.x position on booleans was that of an ID data type with the special table that included only the values "Y" and "N". Since the follow-up data type for ID is Code Value, we could continue to serve the use case for booleans with Code Value constrained to the "Y/N" table.
The reason to not do that is that booleans are just the simples't data type possible and useful on virtually all levels of abstraction, so that it would be a move toward simplicity to define an explicit boolean data type to be used for all indicators. It's so much more easy to use booleans in program decisions, as the following example in a fictive programming language shows:
VAR X : BOOLEAN; ... IF X THEN (* X is true *) ELSE (* X is false *) END IF;
By contrast for dealing with arbitrary Code Value you would have to first check whether the code table used matches the Y_N_TABLE, then you would treat every possible case including that the given value is neither "Y" nor "N".
VAR X : CodeValue; ... IF X.codeSystem == CodeSystem.Y_N_TABLE THEN IF X.value == "Y" (* X is true *) ELSE IF X.value == "N" THEN (* X is false *) ELSE (* EXCEPTION: X is neither true or false *) END IF; END IF; END IF;
Why would we not want to use boolean data types?
Backwards compatibility to v2.x has never been (and should not be) the major issue for design decisions for v3.0. However, through type conversions we can actually allow for backwards compatibility. Thus, a Boolean would convert to a Code Value by using the Y/N table. Any Code Value with the coding system set to the Y/N table can be converted to a boolean.
Note: We should, however, not define a conversion from Integer Number to Boolean on the basis of 0 = false, 1 = true. While the Y/N table's semantics is clearly to represent boolan values, the mapping of boolean's to numbers is not semantically suggested nor is the mapping style determined by semantics (e.g. one could map false to -1 and true to 0, or false to 0 and true to non-zero just as well).
Some people might think that using the Y/N table to capture Boolean semantics is more flexible, because they could later extend the table to cover other (exceptional) values. For instance, some might want to add the value P for "perhaps" and U for "unknown". I call those two extensions to the Y/N table "generally applicable", since they are conceivably valid for all cases where the Y/N table is used. However, those extensions of the Y/N table are not necessary in the context of this data type proposal, since "perhaps" is covered by all the mechinsms to define uncertainty, and the "unknown" exception is covered by the incomplete information mechanisms defined further below.
Other people might still think that the Y/N table should be used to allow for subsequent extensions. An example might be for the patient death indicator, where Y/true means the patient is dead and N/false means that the patient is alive. Now one could make the case that a patient after the diagnosis of "brain death" might be kept in a vegetative state until some organ transplantation. This would be a status beween live and death that neither falls in the category of uncertainty nor incomplete information. So, one might need to extend the Y/N table by "B" for "brain death".
Clearly, such extensions of the Y/N table could be made only at one
point of use of the Y/N table, e.g. only the death indicator
would use the Y/N table extended by "B" for "brain death". This means
that death indicator no longer would be defined as a code
from the Y/N table, but from a "death code" table. According to the
MDF, the attribute type suffix "_ind
" would have to be
changed to "_cd
".
If "death indicator" would have been defined as a Boolean in version 3.0 and later would have to become a code of table "death code" one could either simply change the data type definition between versions or, instead, add another field, such as "death detail status" if "death indicator" is true. Those changes in the use of the field do require RIM changes regardless of whether we used the Boolean data type or not.
If nothing else, a Boolean data type could help sharpen the analytic work of the commitees, because it would be absolutely clear whether or not there can be other values aside from the two opposites represented by true and false.
HL7 v2.x used the word "repeating" to describe certain qualities of the definition of fields and segments. This reflected the observation that "repeated" stuff could occur multiple times in the message. However, obviously there must be a reason why someone would make the decision that a segment or a field is to be repeatable in a message. It turns out that there are different reasons to make that decision. It was never clear from the HL7 spec. what the meaning of repeatability was in every instance.
The stuff that could repeat was either a segment or a field. For the purpose of this discussion we will consider the v3 equivalent of a segment to be a class, whereas the v3 equivalent of a field is an attribute.
If segments repeated in v3 this expressed a relationship (cardinality) between classes. When fields were declared "repeatable" this expressed a relationship between an attribute and its data values. We will concentrate here on the relationship between attributes and data values rather than on inter-class relationships, although what we say here is equally valid for class relationships.
In general, when things end up being "repeatable" we have a collection of things.
Consider the example of Patient "telephone number" (tel) that might be declared as a "repeatable" field in version 2. The meaning of this is obviously that a patient has several telephones, we ususally say, a patient has a "set" of telephone numbers. The word "set" implies that (1) it would not be meaningful if a given telephone occured twice, and (2) that the order of telephone numbers does not matter.
Obviously from those criteria we can generate a table of all possible combinations:
unordered ordered no multiples SET * multiples BAG LIST
The ordered sequence without multiples is marked by an asterisk since this case is rarely considered in the computer science literature.
We want to do away with language that speaks of "repeated attributes" and want to promote clarity regarding what specific semantic flavor of collections is meant.
In case of waveforms, where "repeatedness" became quite tricky in v2.x. Now we can define a sample of an n-channel waveform signal as a list of n-dimensional vectors, where each vector stands for a particular time.
One question was always associated with collections in HL7: how do we update those collections? We can distinguish the following cases:
One solution is to allow collection to be updated only through speparate trigger events with explicit message structures that would specify exactly what would be changed in which way. Why this strategy works fine for high level RIM objects, such as, Encounter_practitioner, Clinical_observations, etc, for things like "set of stakeholder phone numbers" it is a bit too much of a burdon to define specific trigger events.
But even if we had a trigger event "change patient phone numbers" its is not clear how we would specify what exactly should be changed.
For v2.x the answer always was: you send a snapshot of the collection as you want it to be and the recipient could simply throw away whatever he knows and remember only what you just said. This works somewhat in situations with one master information producer and several slave information consumer, but it totatlly insufficient for collaborative information management. For example, my message could wipe out all the telephone numbers that your already know. The proposed solution is described below on update semantics
In v2.x we had the special values not present
(||
) or null (|""|
) that could be
sent instead of any other value in almost every field in a message.
The semantics of those special values were two fold (1) not present
expressed that information was missing (2) null was able to remove
existing information at the side of the receiver so that this
information was missing afterwards. We will factor this "update"
component out into update semantics
below. Here we only deal with the representation of incomplete
information.
No Information | |||
---|---|---|---|
A No Information value can occur in place of any other value to express that specific information is missing and how or why it is missing. This is like a NULL in SQL but with the ability to specify a certain flavor of missing information. | |||
component name | type/domain | optionality | description |
flavor | Code Value, | optional | The flavor of the null value. Can be interpreted as the reason why the information is missing. |
The "flavor" of the null value can be interpreted as the reason why the information is missing. For the time being we keep the list of possible flavors of null subject to open discussions. Numbers of different flavors of null values exist range between 1 (SQL) 70 (reported by Angelo Rossi-Mori).
Stan Huff's CE proposal contains the following null values:
U | unknown | no information at all. I.e. nothing more is known about the circumstances of missing information. |
UASK | asked but unknown | the person asked could not supply the information (why?) |
NAV | not available | the person asked does have the information somewhere but not available right now (e.g. oh, I wrote down what the doctor said last time, but I didn't bring this piece of paper with me). |
NA | not applicable | e.g. an answer to "gestational age" for a patient who is not pregnant. |
NASK | not asked | the person who should collect that information forgot to ask. |
My criticism at Stan's list is mainly because I don't see any atempt to systematize the null values nor to be exhaustive on them. However, now that we defined a fairly general data type for no information, and as we factrored update semantics into its own method, I regard this issue to be less important. In most cases, all that people need is the No Information without the flavor component.
Update semantics deals with the problem of what a receiver is supposed to do with information in the message. That information may be equal to prior information at the receivers data base, in which case no questions occur. But what if the information is different?
We can categorize the cases into the following taxonomy:
In principle, the update mechanism will send an information action code along with each message element instance (MEI). The information action code should be part of the meta model definition of message element instances.
It turns out that updating a list is the most difficult thing, since
positions are relevant in the list. The problem is concurrent updates:
You never know exactly what the list looks like at the receiver's data
base when your update message is being processed.
For example: if you think the list is (A, B, C)
and you
want to insert an element D to come before C you may send an
(INSERT-AT 3 'D)
to insert D at position 3 (and shift C
to position 4). However, if someone rearranged the list to (C,
B, A)
just before your update arrives, the receiver would
insert the D
between B
and A
and you get (C, B, D, A)
. You could have sent an update
expression (INSERT-BEFORE 'C 'D)
which, at the receiver's
side would update (A, B, C)
to (A, B, D, C)
but also (C, B, A)
to (D, C, B, A)
.
The sender of an update expression has to be very sure whether he wants the new element appear in a particular position or in a particular sequence relationship with a particular other element and that concurrent edits to the same data at the receivers side can render the sender's assumptions invalid.
For the technical committees this means that a LIST collection semantics should only be chosen if the order really matters semantically from the perspective of pure abstract application logic. If the order proably is not important enough to justify the headache around concurrent updates, the committee should choose the SET or BAG flavor. Most collections that I come accross are SETs. Bags are very rare. If the collection element type is a class like, e.g., Health_issue, the ranking can (and should) be represented as explicitly by a ranking number rather than implying LIST semantics on some association.
Also note that there are partially ordered collections that often capture the application logic much better than totally ordered lists. Partially ordered collections are collections where elements may have the same ranking, so that you can not always decide whether one element has higher rank than another.
With SETs, concurrent updates are not a problem, because the only thing you do is add or remove values to and from the SET, independent on the prior contents of the set. Updating a BAG is equally straight forward. Therefore selecting SET and BAG semantics should be encouraged. SET is often exactly the right semantic kind of collection from the perspective of pure abstract application logic, without implementation considerations.
Next conference call is next Wednesday, January 13, 1999, 2:00 PM EST.
Agenda items are:
regards
-Gunther