The HL7 version 3 data type task group has had its sixth conference call on Monday, November 23, 1998, 11 to 12:30 AM EDT.

Attendees were:

- Joann Larson,
- Mark Shafarman,
- Greg Thomas,
- Mark Tucker,
- Mike Henderson,
- Robin Zimmerman,
- Gunther Schadow.

Although the minutes of last conference call are not yet written, the good news is that we are basically done with the coded information stuff. It will all be subject to revision when we wrap it up into a proposal document for January 1998. We can further discuss in and after January, of course.

Today we attacked:

- NM - for all kinds of numbers
- SI - for positive integer numbers
- SN and XNM for more complex stuff, like ranges, ratios and some ordinals.

We agreed that we would break up the former NM data type into two
data types, for **integer** and **real** numbers. We then
discovered that **rational** numbers are needed for reporting some
clinical results, such as titers.

Integer Number (Integer, IN) | |||
---|---|---|---|

Integer numbers are precise numbers that originate in
counting actions or operations on other integers. No arbitrary limit
imposed on the range of integer numbers.
| |||

PRIMITIVE TYPE |

No arbitrary limit is imposed on the range of integer numbers. Thus, theoretically, the capacity of any binary representation is exceeded, whether 16 bit, 32 bit, 64 bit, or 128 bit size. Domain comittees should not limit the ranges of integers only to make sure the numbers fit into current data base technology. In finance and accounting those limits are frequently exceeded (e.g., consider the U.S. national budget expressed in Italian Lira or Japanese Yen.) Designers of Implementable Technology Specifications (ITS) should be aware of the possible capacity limits of their target technology.

In cases where limits on the value range are suggested semantically by the application domain, the committees should specify those limits. For example, the number of prior patient visits is a non-negative integer including 0.

Although we do not yet have a formalism to express constraints, we should not hesitate to document those constraints informally. We will eventually define (or deploy) a constraint expression language.

We allow integer numbers to be represented by character string literals containing signs and decimal digits. Implementable Technology Specifications (ITS) such as for XML will most likely use the string literal to represent integers. Other ITSs, such as for CORBA, might choose to represent integers by variable length bit strings or by choices of either a native integer format or a special long integer format.

**ISSUE:** do we want to define non-decimal
representations in bases 2, 8, 16, and 64?

Floating Point Number (Float, FPN) | |||
---|---|---|---|

Floating point numbers are approximations for real numbers. Floating point numbers occur whenever quantities of the real world are measured or estimated or as the result of calculations that include other floating point numbers. | |||

component name | type/domain | optionality | description |

value | Real Number |
required |
The value without the notion of precision or with an arbitrary precision. We do not specify a data type for true real numbers of infinite precision. |

precision | Integer Number | required |
The precision of the floating point number in terms of the number of significant decimal digits. |

The precision of a floating point number is defined here as the number
of decimal digits. According to Robert S. Lederley [Use of
computers in biology and medicine, New-York, 1965,
p. 519ff]: "A number composed of *n* significant figures is
said to be *correct to n significant figures* if its
value is correct to within 1/2 unit in the least significant
position. For example, if 9072 is correct to four significant figures,
then it is understood that the number lies between 9072.5 and 9071.5
(that is 9072 ± 0.5) [...]"

Obviously this method of stating the uncertainty of a number is dependent on the number's decimal representation. For binary representations we could, in principle, specify the precision more granularly. However, the statement that a value lies within a certain range is problematic anyway, because it begs the question about which level of confidence we assume. We will define a generic data type for probability distributions that allows exact statements of uncertainty.

Mike Henderson brought up the terms *precision*
vs. *accuracy*, where precision means the exactness of the
numeric representation of a value, and where accuracy refers to the
smallness of error in the measurement or estimation process. While
those concepts can be distinguished, they are related inasmuch as we
do not want to specify a higher precision of a number as we can
justify by the accuracy of the process generating the
number. Conversely, we do not want to specify a number with less
precision than justifiable by the accuracy.

In fact, there is considerable confusion around the meaning of those terms as precision, accuracy, error, etc. Since I myself have difficulties in seeing clear distinctions between those concepts, I reviewed some of the literature, starting from the NIST's Guidelines for the expression of uncertainty in measurement. which in turn is based on the ISO's International Vocabulary of Basic and General Terms in Metrology (VIM). In addition, the European standard ENV 12435 Medical informatics - expression of the results of measurements in health sciences, in its normative Annex D, summarizes the NIST's position.

To summarize: NIST's Guidelines, and ISO's VIM regard the
term *accuracy* as a "qualitative concept". Other related terms
are *repeatability*, *reproducibility*, *error*
(*random* and *systematic*), etc. All those slightly
different but related and overlapping concepts have been subsumed
under the broader concept of *uncertainty* in a 1981
publication by the International Committee for Weights and Measures
(CIPM) in accordance with ISO and IEC. The uncertainty of meausrement
is given as a probability distribution around the true measurement
value (measurand). Given such a probability distribution, a value
range can be specified within which the true value is found with some
*level of confidence*.

These concepts, based on statistical methods, are well known in the medical profession. However, these methods are quite complex, and exact probability distributions are often unknown. Therefore, we want to keep those seperate from a basic data type of floating point numbers. However, floating point numbers are approximations to real numbers and we want to account for this approximative nature by keeping a basic notion of precision in terms of significant digits right in the floating point data type.

In many situations, significant digits are a sufficient estimate of the uncertainty, but even more important, we must account for significant digits at interfaces, especially when converting between different representations. For instance, we do not want a value 4.0 to become 3.999999999999999999 in such a conversion, as happens sometimes when converting decimal representations to IEEE binary representations.

No arbitrary limit is imposed on the range or precision of floating point numbers. Thus, theoretically, the capacity of any binary representation is exceeded, whether 32 bit, 64 bit, or 128 bit size. Domain comittees should not limit the ranges and precision of floating point numbers only to make sure the numbers fit into current data base technology. Designers of Implementable Technology Specifications (ITS) should be aware of the possible capacity limits of their target technology.

In cases where limits on the value range are suggested semantically by the application domain, the committees should specify those limits. For example, probabilities should be expressed in floating point numbers between 0 and 1.

Although we do not yet have a formalism to express constraints, we should not hesitate to document those constraints informally. We will eventually define (or deploy) a constraint expression language.

We allow floating point numbers to be represented by character string literals containing signs, decimal digits, a decimal point and exponents. An ITS for XML will most likely use the string literal to represent floating point numbers. Other ITSs, such as for CORBA, might choose to represent floating point numbers by variable length bit strings or by choices of either a native (IEEE) floating point format or a special long floating point format.

Decimal floating point numbers can be represented in a standard
way, so that only significant digits appear. This standard
representation always starts with an optional minus sign and the
decimal point, followed by all significant digits of the mantissa
followed by the exponent. Thus 123000 is represented as
"`.123e6`

" to mean .123 × 10^{6}; 0.000123 is
represented as "`.123e-3`

" to mean .123 ×
10^{-3}; and -12.3 is represented as "`-.123e2`

".
to mean -.123 × 10^{2}.

The reason why we define decimal literals for data types is to make
the data human readable. To render the value 12.3 as
"`.123e2`

" is not considered intuitive. The European
standard ENV 12435 recommends that the exponent should be
adjusted such as to yield a mantissa between 0.1 and 1000. Those
representations tend to be easier to memorize.
The external representation is of the form:

sign::= `|`

+

-digit::= `|`

0`|`

1`|`

2`|`

3`|`

4`|`

5`|`

6`|`

7`|`

8

9digits::= digitdigits|digitdecimal::= digits

.digits|

.digitsmantissa::= signdecimal|decimalexponent::= signdigits|digitsfloat::= mantissa

eexponent

The number of significant digits is determined according to Lederley (ibid.) and ENV 12435:

- All non-zero digits are significant.
- Leading zeros are not significant, regardless of the decimal point's position.
- All trailing zeros are significant, regardless of the decimal point's position.

Note that rule number 3 diverts from Lederley and
ENV 12435. Judgement about the significance of trailing zeroes is
often defered to common sense. However, in a computer communication
standard common sense is not a viable criterion (common sense is not
available on computers.) Therefore we consider all trailing zeroes
significant. For example 2000.0 would have five significant digits and
1.20 would have three. If the zeroes are only used to fix the decimal
point (such as in 2000) but are not significant we require to use
exponents in the representation: "`2e3`

" to mean "2 ×
10^{3}".

Note that this proposed data type has been significantly changed and is now called Ratio.

Rational Number (Ratio, RN) A rational is a number that comes about through division of a integer numerator with an integer denominator. Rationals occur in laboratory medicine as "titers", i.e., the maximal dissolutions at which an analyte can still be detected. component name type/domain optionality description numerator Integer Numberdefault is 1 The numerator. denominator Integer Number

default is 1 The denominator.

HL7 v2.3 defined the data type "structured numeric" (SN) for various purposes. Among those purposes was to cater the need to express rational numbers that often occur as titers in laboratory medicine. A titer is the maximal dissolution at which an analyte can still be detected. Typical values of titers are: "1:32", "1:64", 1:128", etc. Powers of 1/2 or 1/10 are common. Sometimes titer results are written inconsistently as ratio or only as the denominator. Nevertheless, these are rational numbers. Although mathematically, rational numbers are exact, titers are a result of a measurement process, that can never be exact.

Thus, in theory, a titer of 1:128 could be reported as
0.0078125. However, noone would understand that result. One could
recover the original ratio using the inverse of 10000000/78125 which
is 128, but to do that, the receiver would have to know that the given
number is to be represented as a ratio of 1/*n*.

Currently we do not know of any other use of rational numbers
except for titiers. We do know that blood pressure values, commonly
reported as 120/80 mm Hg are **not** ratios, but rather a
composite of systolic and diastolic blood pressure.

**Because we did not define rational numbers for their
exactness but for their common representation, we might want to
consider defining a data type "Ratio (of two floating point numbers)"
instead.**

Next conference call is next Monday, November 30, 1998.

Agenda items are:

- reconsider the rational data type (should we define "ratio of floating point numbers" instead?)
- Measurements (physical quantities)
- Currency (economical quantities)
- Time
- Ordinals

Participants are invited to read and think about the three slides on those matters.

regards

-Gunther Schadow