Changes to the data type report

Please see the context diff file for a complete catalog of line-by-line changes. This may be pretty hard to read for those who have never seen a context diff file and who may not be very used to HTML source code.

Unfortunately HTML v4 revision marks are not yet supported by most browsers, which is why I can not offer a nicely formatted HTML with revision marks.

But here is a catalog of all major changes:

1.2.6 The Meta Model

The Meta Model discussion has been deleted from this specification and can now be found in the HL7 version 3 Message Development Framework (MDF).


1.2.11 Incomplete Information

Section revised and extended in order to clarify the issue of "flavors or null" a little and in order to nail down the flavors of null more precisely.


2 Text

2.1 Introduction

2.3 Display Data

"Multimedia Enabled Free Text" or "Free Text" has been changed to "Display Data" because of some confusion that the term "Text" evoked when used for multimedia data. The justification why multimedia is text anyway has been moved up into the introduction to text, and the name change should avoid this question to cause further confusion.


3.2.1 State of a State Machine

Added a section about a state data type. Needs discussion and consensus in MnM and CQ.


3.3.1 The Concept Descriptor

3.3.3 Code Phrase

Added a paragraph to the open issues list, we need to revisit CodePhrase.

Note that from the SNOMED camp there is probably support for an even more complex definition of the Code Phrase that would basically be a keyword-value structure containing small conceptual graphs. [cf. Spackman KA. Compositional concept representation using SNOMED: towards further convergence of clinical terminologies. Proc Annu Symp Comput Appl Med Care. 1998 Oct. p. 740-4.]


3.4.3 Technical Instance Locator

Added a few open issues to the TIL that came up by MnM and in the Harmonization meeting. We need to address those issues.

The use of the TIL for phone numbers needs more explanation and rationale.

The TIL may need to be wrapped in a History.

The TIL may need some "use code", to capture the qualifiers "business", "home", "cellphone", etc. for phone numbers. How does this "use code" generalize to other communication addresses? Why is it needed?


3.5.1 Real World Instance Identifier

Extensive change of the entire section to incorporate the recent Harmonization resolutions. It is now factual and no longer a mere proposal to PAFM. Some new material discussing about the issues around an identifier type code table. The RWII is now under the Stewardship of CQ.


3.5.3 Person Name

Extensive change of the entire section to incorporate the recent Harmonization resolutions. It is now factual and no longer a mere proposal to PAFM. The Person_name class remains under PAFM stewardship, however the PN data type is still maintained by us.

Proposed addition of one new Person Name Part Classifier of Axis 3 for "middle name". Some folks were confused not to find a "middle name" any more and it seems to be difficult for those people to imagine the middle name to be just the second given name.


3.5.4 Organization Name

Somewhat aligned to the definition of the Person Name (PN) data type.

Note: this has changed. In a previous draft the Organization Name (ON) was a set of Organization Name Variants (ONXV) with no additional information. It is therefore simpler to define ON in parallel with PN as representing one name variant and let PAFM handle the rest in the RIM.

Note: a harmonization request to PAFM is required for the Organization class to

  1. delete attribute: Organization.organization_name_type_cd

    Rationale: Attribute duplicates the ON.type component of the Organization name data type.

  2. rename attribute: Organization.organization_nm to "nm"

    Rationale: Name does not conform to the MDF style guide as it repeats the name of its class.

  3. assign data type: Organization.nm : SET<ON>


4.3 Real Number (was: Floating Point Number)

Last minute change of the name from Floating Point number to Real number.

Note: can we change the name in the last minute? I realized too late that calling it "Floating Point Number" is incorrect, since that name refers to a particular computer-representation of a number. I would now much rather call it "Real".


4.6 Time

Extensive change to this entire section and addition of new material in the introduction and on the detail level. This is a terribly complex area and there is about no literature to find on the internet that thorroughly summarizes the issues of calendars under the perspective of data and communications. Most standards are pretty shallow in their understanding of the underlying complexity. So, I am sort of proud about this section. It may still need some polishing to make it consistent with our overall approach.

Especially I finally nailed down the problem of periodically recurring time points and intervals. This needs thorrough review and consensus by the group.


5.1 Interval

Added a width component to the interval.

In order to treat incomplete information uniformly we must accomodate the case where only the width of an interval is known whereas both boundaries are unknown. Otherwise we would force one case of incomplete information to be represented by a different data type, and thus a different dependent attribute, which would force the constraints of dependency between the interval and its width to be handled outside. This would violate the rule of encapsulation.

The fact that the width is kept as a component of the interval illustrates once more that data type components in this specification are semantic components and not components of any particular representation. This means that if a representation of an interval is based on low and high boundary, the width will only be made explicit in the exceptional case where both (!) boundaries are undefined. Another representation may be based on low boundary and width, in which case the high boundary will only be sent in the exceptional case where low boundary and width are undefined. Every representation will have to deal with one such exceptional case though.

Fixed the literal expression syntax.


5.1.1 Intervals as Sets - The Notion of Set Revisited

Added a section about Intervals and their relationship with the set concept.

Intervals are continuous sets of elements of the base data type. Thus intervals have a relationship with set-collections. Discrete intervals can be converted into an enumerated set-collection. We thus have to revisit our notion of set as defined initially. A set is no longer just an enumerated collection of discrete unordered elements. The various kinds of sets are described by the following taxonomy:

Set

set-collection (finite, discrete, enumerated set)

interval (continuous ordered subset)

finite countable interval (e.g., integers 1-3)

unbounded infinite countable interval (e.g., all integers)

partially bounded infinite countable interval (e.g., integers > 3)

totally bounded infinite uncountable interval (e.g., real 0.0 - 1.0)

periodic point in time (sparse, infinite, discrete, ordered subset of point in time)

periodic interval of time (sparse, infinite, partially continuous, ordered subset of point in time) alternatively: set of interval of point in time.

set derived from other sets through set operations (union, intersection.)

At this point all of the above mentioned kinds of sets are defined, except for the general derivative set that is specified as a set algebra term from other sets.

This would need more work. The entire idea of set and collections should be revised to build this taxonomy of sets into the core of V3DT more thoroughly. The discussion of collections may need to be split to put the major part into the chapter 5 right before the Interval section. Ahrgh, this is hard work!


5.2 General Annotations

Bob Dolin suggested to added a clarification:

An annotation can not change the meaning of the annotated value and must not be used when the value would be wrong without the annotation.

5.3 The Historical Dimension

Significantly modified to accomodate cyclic changes (e.g., snowbird addresses) using the new periodic time constructs.

[Validity period is] the set of time the given information was, is, or is expected to be valid. This set of time can be a simple interval of time or a periodic point or interval of time for cyclic events. The interval can be open or closed infinite or undefined on either side.

This depends upn a strong conceptualization of a general set that in turn needs more work. See above.


5.4 Uncertainty of Information

Added a bunch of new issues that I believe we have disregarded over the discussion of whether we should care about uncertainty at all.

A much more important open issue is the relationship between sets, bags, intervals and periodic sets and uncertainty. It appears as if general notion of a set can be used where multiple possible values exist without any particular probability distribution. This would translate to the uniform probability distribution over the set. The question is whether the data type definitions for probability distributions should not be better aligned to the notion of sets.

An second related issue is the fact that we sometimes want to use a probability distribution (parametric or non-parametric) in order to describe a frequency distribution. Sometimes laboratory observations on population samples are reported in such a "consolidated" way using histograms. Although the distinction between "probability" and "frequency" is blur, the wording in this specification may need to be changed to invite the probability constructs to be used for frequencies as well.

A third related issue is whether we want to support other "weights" of certainty and importance that have become well-known in the decision support community. Examples are the weights of logistic regression and neural nets, all kinds of plausibility measures (Dempster-Shafer possibilities, Fuzzy membership functions, Shortliffe's certainty factors, etc.), and the heuristic numbers used in Internist-I/QMR (evoking strength, frequency, import), or Medcin and others.

A third related issue is that the probability distributions and especially the parametric probability distribution can be used to describe distribution quantities other than probabilities. For example, a probability distribution "multiplied with" a flow rate may describe the setting of a ventilator. Should we extend our definition to embrace quantities that are neither probabilities nor frequencies nor any other uncertainty measure?


5.4.2 Non-Parametric Probability Distribution

Added a conversion rule in the spirit of the above mentioned issues:

A bag-collection can be cast to a non-parametric probability distribution, where the probabilities for each item of the bag are the quotient of the count of that item devided by the size of the bag.


A bag of other issues