V3DT conference call notes for Mon, Jan 4, 1999.

The HL7 version 3 data type task group has had its twelfth conference call on Monday, January 4, 1999, 11:00 to 12:30 EST.

Attendees were:

Agenda items were:

  1. collections [slide]
  2. incomplete information (what Stan calls "exception") [slide] [slide] [slide]
  3. update semantics [slide]
  4. review of the uncertainty debate time boxed to 10 minutes.

But we made none of those original agenda items. We had a discussion about the principles of this work. The problem with principles is, you can (almost) discuss forever without making any visible progress. On the other hand, when no common principles are agreed to, one engages into endless discussions about details without knowing that the issues really is a discrepancy in principles.

The attentive observer of the last ten notes will have noticed that Stan Huff and I disagree on many detail issues, and this disagreement seems to be a symptom of some underlying disagreement in principles. I felt that rather than continuing on the agenda's details we had to give a chance to find some principle agreement at the beginning of this new year. Stan wanted to also speak for himself in an e-mail, that I will also post here. I try anyway to account for the points of discussion here. Of course, I am unable to give a high-fidelity summary of Stan's ideas here, so I apologize for any distortions.

Please note: when I name "Stan" below I do that in order to give proper attribution of ideas, not in order to single Stan out from the rest of the group.

Stan sais that probability belongs in the same category as abnormal flags and normal ranges. Why then do we built specific data types for probability and not for abnormal flags and normal ranges?

Stan understands that everything is a type [and rightfully so, according to the Message Element Type (MET) concept developed by the v3 ITS group of Summer '98]. His question is which types are more "primitive" (fundamental or basic) and thus maintained by Control/Query and which ones are more complex (specific or advanced) and thus had to be maintained by some special Technical Committee. His concern obviously is that some types ought not to be defined in this data type project.

Stan believes that things like units of measure, "probability, uncertainty," abnormal flags and reference ranges are all additional qualifications of "numeric" data. The question thus is why one would define a special data type for probability

Stan believes that it is very important to constrain data in all of those ways when messages are defined. I.e. you want to allow a number with unit but no uncertainty here, and a number without unit but with uncertainty there, and so on.

Stan suggested that there are two approaches to defining data types and making those constraints he believes are important.

  1. Define several types to cover those qualifications and combine the types in many different ways. Make constraints by defining types.
  2. Define one general type, that would contain all those "qualifications" (units, uncertainty, abnormal flags, reference range) and constrain this type by rules like: use only units here, only uncertainty there, etc.
Since (1) making constraints by defining types is is what we have done so far (to be fair: it is what some of us, including myself, have pursued so far), obviously Stan favors the second alternative: one general type and making constraints on that.

While Stan's suggested general type feels pretty much like the OBX, Stan did not want to imply here that "everything should be an OBX", but that this OBX-like structure should be used as a data type that could appear as a data type of every RIM class attribute. He did not imply here that all information should end up being reported as an instance of the "Clinical_observation" class.

Mark Shafarman liked the idea of that one general type and he believed that complexity in HMDs could be reduced if such a general type was used rather than those combined special types. However, I said that the true complexity of HMDs only depends on how many distinctions you want to make, not how you make the distiinctions. Stan agreed to that his general type would not necessarily result in simpler HMDs, and that this is not the main issue he is concerned with.

Joanne Larson sort of concurred with Stan here by saying that certain things apply to all measurements: number, unit, probability, and "what I [Joanne] call level of certainty". The data type should support all that but if you don't need it, you don't use it.

I agree that Stan's one general data type has some appeal. It is one instead of many; it is always uniform; it seems to involve no combinatorial complexity (at first glance); it seems to be more synoptic, showing all that can be said about some "numeric information"; after all, it seems to be more simple and straight forward - who could not agree to the simpler and more straight forward solution?

In order to show what my (and - as much as I can say - Mark Tucker's) main goal was in designing this system, I am going to (ab)use Stan's own words: "We want to assign data types to RIM attributes whose semantics is well-understood". Every data type should capture one semantic area as completely and independent as possible.

I that system, I consider a real number being an independent concept. A physical measurement (number with unit) being another concept. A ranges or ratios can be build out of quantities (e.g. numbers, measurements, timestamps). Uncertainty is another concept, that applies to all data types, but it inherently dependent on whether the underlying data type (discrete-continuous, finite-infinite). Historical information and annotations are truely independent concepts that depend neither on each other, nor on the underlying data type nor on anything else.

Semantic and logical analysis (called phenomenology of information in the slides) should have revealed the principally useful data types and their true dependencies among each other. I think in the last ten conferences we have achieved a lot in this analysis, and we are almost done.

The reason for this a priori approach was that when the principle semantic dimensions of information are covered, and when the system of semantic concepts and categories is designed such that no non-sense combinations can be generated, we can be pretty sure that most practical use case of the future are covered. Conversely the use-case approach can only capture information categories that are known to be actually used now by a considerable group of people. Whenever in the future some other use cases come up, one has to revise the type system.

Changing the data type system means: hacking the new stuff into the old stuff, making sure the old stuff is not broken, making sure that nonsense can not be said. In this evolutionary process one might discover that one would have designed the type system differently in the first place, if only those recent use-cases would have been established up front.

I claim that we went through all this "pragmatic" type system approach once in version 2.x of HL7 and we should have learned our lesson. We have seen how new use-cases are hacked into the existing system. After all, this whole redesign approach was born from the growing tension between initial structure of the data type system and the current user needs.

The intention in doing this theoretical approach is not to enforce some home-grewn dogma of information science on system developers. It can not be made clear enough that through the type system discussed in the last ten conferences, HL7 interfaces will not enforce functionality on information systems. Conversion rules are here to make sure that a sender can say all the detail that he wants to say (not more and not less) and that the receiver can find as much as he can digest. If someone does not do uncertainty of birth date, he doesn't have to. But if someone does, he can.

Abnormal flags and reference ranges are not in the same category as units of measure. Consider this procedural model for making quantitative physiologic measurements: you first have a measurement device that returns a numeric value. The theory and convention of measurement that the device implements determines the unit of measure. That both is on a fairly low level: numbers defined by mathematics and physical measurements defined by physics. The measurement device always has uncertainty, defined by another area of mathematics. Abnormal flags do not come into play before reference ranges and are interpretations of the measured value, not "qualifyers" of it. Reference ranges depend on reference populations, abnormal flags depend on reference ranges. That's all way up on a higher level compared to number, unit and uncertainty.

The reason why uncertainty is not part of the number data type and not part of the measurement with unit data type is because Mark Tucker and I felt that we want to allow people to continue not to bother about uncertainty. Noone should ever be unduly forced to bother with generic data types of uncertain information if he does not deal with uncertainty. Furthermore, as the analysis of the uncertainty area shows, there are a number of variations by which uncertainty can be covered that depend on what the basic data type is (i.e. continuous vs. discrete) and what your concept of uncertainty is (i.e. probability distribution (parametric/non-parametric), narrative qualifiers of uncertainty, confidence level intervals). It is better to keep this interesting but complex stuff aside for the interested reader and not bother people with that who just want to send a plain old number.

Finally I believe that the general data type suggested by Stan is not less complex and is not simpler. All the combinatorial complexities that we now have in the v3 data type proposal, does exist in a flat OBX-like data type. The combinatorics is now present in the rules of the form: if A is present than B may not be present and the like, all that stuff that currently guides the creation of sensible OBX segment instances.

Mark Shafarman summarized the issue at hand as one of finding the proper border between a RIM class and a data type.

Next conference call is next Monday, January 11, 1999, 11:00 AM EST.

Agenda items are:

  1. collections [slide]
  2. incomplete information (what Stan calls "exception") [slide] [slide] [slide]
  3. update semantics [slide]
  4. review of the uncertainty debate time boxed to 10 minutes.

Please think about how we present this work in Orlando. Do YOU want to take on some section (talking, making slides, writing, whatever)? Mark, when will the v3 data type discussion take place in Orlando? Wednesday?


-Gunther Schadow