The HL7 version 3 data type task group has had its sixth conference call on Monday, November 23, 1998, 11 to 12:30 AM EDT.
Attendees were:
Although the minutes of last conference call are not yet written, the good news is that we are basically done with the coded information stuff. It will all be subject to revision when we wrap it up into a proposal document for January 1998. We can further discuss in and after January, of course.
Today we attacked:
We agreed that we would break up the former NM data type into two data types, for integer and real numbers. We then discovered that rational numbers are needed for reporting some clinical results, such as titers.
Integer Number (Integer, IN) | |||
---|---|---|---|
Integer numbers are precise numbers that originate in counting actions or operations on other integers. No arbitrary limit imposed on the range of integer numbers. | |||
PRIMITIVE TYPE |
No arbitrary limit is imposed on the range of integer numbers. Thus, theoretically, the capacity of any binary representation is exceeded, whether 16 bit, 32 bit, 64 bit, or 128 bit size. Domain comittees should not limit the ranges of integers only to make sure the numbers fit into current data base technology. In finance and accounting those limits are frequently exceeded (e.g., consider the U.S. national budget expressed in Italian Lira or Japanese Yen.) Designers of Implementable Technology Specifications (ITS) should be aware of the possible capacity limits of their target technology.
In cases where limits on the value range are suggested semantically by the application domain, the committees should specify those limits. For example, the number of prior patient visits is a non-negative integer including 0.
Although we do not yet have a formalism to express constraints, we should not hesitate to document those constraints informally. We will eventually define (or deploy) a constraint expression language.
We allow integer numbers to be represented by character string literals containing signs and decimal digits. Implementable Technology Specifications (ITS) such as for XML will most likely use the string literal to represent integers. Other ITSs, such as for CORBA, might choose to represent integers by variable length bit strings or by choices of either a native integer format or a special long integer format.
ISSUE: do we want to define non-decimal representations in bases 2, 8, 16, and 64?
Floating Point Number (Float, FPN) | |||
---|---|---|---|
Floating point numbers are approximations for real numbers. Floating point numbers occur whenever quantities of the real world are measured or estimated or as the result of calculations that include other floating point numbers. | |||
component name | type/domain | optionality | description |
value | Real Number | required | The value without the notion of precision or with an arbitrary precision. We do not specify a data type for true real numbers of infinite precision. |
precision | Integer Number | required | The precision of the floating point number in terms of the number of significant decimal digits. |
The precision of a floating point number is defined here as the number of decimal digits. According to Robert S. Lederley [Use of computers in biology and medicine, New-York, 1965, p. 519ff]: "A number composed of n significant figures is said to be correct to n significant figures if its value is correct to within 1/2 unit in the least significant position. For example, if 9072 is correct to four significant figures, then it is understood that the number lies between 9072.5 and 9071.5 (that is 9072 ± 0.5) [...]"
Obviously this method of stating the uncertainty of a number is dependent on the number's decimal representation. For binary representations we could, in principle, specify the precision more granularly. However, the statement that a value lies within a certain range is problematic anyway, because it begs the question about which level of confidence we assume. We will define a generic data type for probability distributions that allows exact statements of uncertainty.
Mike Henderson brought up the terms precision vs. accuracy, where precision means the exactness of the numeric representation of a value, and where accuracy refers to the smallness of error in the measurement or estimation process. While those concepts can be distinguished, they are related inasmuch as we do not want to specify a higher precision of a number as we can justify by the accuracy of the process generating the number. Conversely, we do not want to specify a number with less precision than justifiable by the accuracy.
In fact, there is considerable confusion around the meaning of those terms as precision, accuracy, error, etc. Since I myself have difficulties in seeing clear distinctions between those concepts, I reviewed some of the literature, starting from the NIST's Guidelines for the expression of uncertainty in measurement. which in turn is based on the ISO's International Vocabulary of Basic and General Terms in Metrology (VIM). In addition, the European standard ENV 12435 Medical informatics - expression of the results of measurements in health sciences, in its normative Annex D, summarizes the NIST's position.
To summarize: NIST's Guidelines, and ISO's VIM regard the term accuracy as a "qualitative concept". Other related terms are repeatability, reproducibility, error (random and systematic), etc. All those slightly different but related and overlapping concepts have been subsumed under the broader concept of uncertainty in a 1981 publication by the International Committee for Weights and Measures (CIPM) in accordance with ISO and IEC. The uncertainty of meausrement is given as a probability distribution around the true measurement value (measurand). Given such a probability distribution, a value range can be specified within which the true value is found with some level of confidence.
These concepts, based on statistical methods, are well known in the medical profession. However, these methods are quite complex, and exact probability distributions are often unknown. Therefore, we want to keep those seperate from a basic data type of floating point numbers. However, floating point numbers are approximations to real numbers and we want to account for this approximative nature by keeping a basic notion of precision in terms of significant digits right in the floating point data type.
In many situations, significant digits are a sufficient estimate of the uncertainty, but even more important, we must account for significant digits at interfaces, especially when converting between different representations. For instance, we do not want a value 4.0 to become 3.999999999999999999 in such a conversion, as happens sometimes when converting decimal representations to IEEE binary representations.
No arbitrary limit is imposed on the range or precision of floating point numbers. Thus, theoretically, the capacity of any binary representation is exceeded, whether 32 bit, 64 bit, or 128 bit size. Domain comittees should not limit the ranges and precision of floating point numbers only to make sure the numbers fit into current data base technology. Designers of Implementable Technology Specifications (ITS) should be aware of the possible capacity limits of their target technology.
In cases where limits on the value range are suggested semantically by the application domain, the committees should specify those limits. For example, probabilities should be expressed in floating point numbers between 0 and 1.
Although we do not yet have a formalism to express constraints, we should not hesitate to document those constraints informally. We will eventually define (or deploy) a constraint expression language.
We allow floating point numbers to be represented by character string literals containing signs, decimal digits, a decimal point and exponents. An ITS for XML will most likely use the string literal to represent floating point numbers. Other ITSs, such as for CORBA, might choose to represent floating point numbers by variable length bit strings or by choices of either a native (IEEE) floating point format or a special long floating point format.
Decimal floating point numbers can be represented in a standard
way, so that only significant digits appear. This standard
representation always starts with an optional minus sign and the
decimal point, followed by all significant digits of the mantissa
followed by the exponent. Thus 123000 is represented as
".123e6
" to mean .123 × 106; 0.000123 is
represented as ".123e-3
" to mean .123 ×
10-3; and -12.3 is represented as "-.123e2
".
to mean -.123 × 102.
The reason why we define decimal literals for data types is to make
the data human readable. To render the value 12.3 as
".123e2
" is not considered intuitive. The European
standard ENV 12435 recommends that the exponent should be
adjusted such as to yield a mantissa between 0.1 and 1000. Those
representations tend to be easier to memorize.
The external representation is of the form:
sign ::= +
|-
digit ::= 0
|1
|2
|3
|4
|5
|6
|7
|8
|9
digits ::= digit digits | digit decimal ::= digits .
digits |.
digitsmantissa ::= sign decimal | decimal exponent ::= sign digits | digits float ::= mantissa e
exponent
The number of significant digits is determined according to Lederley (ibid.) and ENV 12435:
Note that rule number 3 diverts from Lederley and
ENV 12435. Judgement about the significance of trailing zeroes is
often defered to common sense. However, in a computer communication
standard common sense is not a viable criterion (common sense is not
available on computers.) Therefore we consider all trailing zeroes
significant. For example 2000.0 would have five significant digits and
1.20 would have three. If the zeroes are only used to fix the decimal
point (such as in 2000) but are not significant we require to use
exponents in the representation: "2e3
" to mean "2 ×
103".
Note that this proposed data type has been significantly changed and is now called Ratio.
HL7 v2.3 defined the data type "structured numeric" (SN) for various purposes. Among those purposes was to cater the need to express rational numbers that often occur as titers in laboratory medicine. A titer is the maximal dissolution at which an analyte can still be detected. Typical values of titers are: "1:32", "1:64", 1:128", etc. Powers of 1/2 or 1/10 are common. Sometimes titer results are written inconsistently as ratio or only as the denominator. Nevertheless, these are rational numbers. Although mathematically, rational numbers are exact, titers are a result of a measurement process, that can never be exact.
Thus, in theory, a titer of 1:128 could be reported as 0.0078125. However, noone would understand that result. One could recover the original ratio using the inverse of 10000000/78125 which is 128, but to do that, the receiver would have to know that the given number is to be represented as a ratio of 1/n.
Currently we do not know of any other use of rational numbers except for titiers. We do know that blood pressure values, commonly reported as 120/80 mm Hg are not ratios, but rather a composite of systolic and diastolic blood pressure.
Because we did not define rational numbers for their exactness but for their common representation, we might want to consider defining a data type "Ratio (of two floating point numbers)" instead.
Next conference call is next Monday, November 30, 1998.
Agenda items are:
Participants are invited to read and think about the three slides on those matters.
regards
-Gunther Schadow