S. Introduction to ASN.1, ER7, and ASN.7 ER7 began as an experiment to see if we HL7 encoding rules could be extended to accomodate the richness and expressiveness of the message specification language ``ASN.1''. It turns out that we can with only need a slight extension to HL7's encoding rules. ER7 allows compound values (nested fields) list valued fields (``repeated fields''), and variant fields (choice). ER7 has a provision for tagging fields with identifiers, types and even field lengths, but these extensions are not necessary to support ASN.1 The Version 3 Task Force recommended that ASN.1 be used as the message specification language. This document attempts to define a useful subset of ASN.1. A principle operational requirement on this effort is is that there should be an obvious correspondence between an ASN.1 message specification and its encoding in ER7. A possible shortcomming in ASN.1 was found: ASN.1 has no way to make the distinction between a Message and a Segment and a Field. To ASN.1, they are all compound values called SEQUENCE's. To make this correspondence as clear as possible, several modifications to ASN.1 are recommended. Not able to leave well enough alone, I call this modified message specification language ASN.7 SS. A Quick Example To give a flavor of ASN.1 and ER7 before we dive into the details, consider the following ASN.1 definitions: :+ - -- ASN.7 sample DEFINITIONS ::= BEGIN PatientId ::= String; PersonName ::= RECORD { first String, last String }; PID ::= SEGMENT { id PatientId, name PersonName, roomNr INTEGER, aliases LIST OF String, tels LIST OF RECORD { kind String number String } } END; :- This specification says: For this installation, patient id's are \t{Strings}, and the Patient's name is a two-component compound value. The \t{PID} segment contains *SIX* fields, 1. The segment tag (which must have the value 'PID'). (This field is implied by the SEGMENT construct.) 2. An \t{id}, which is a string. 3. A \t{name}, which has two components. 4. A \t{roomNr}, which is an integer. 5. A field named \t{aliases} whose value is a list of strings. 6. A field named \t{tels} whose value is a list of compound values. Each element of the list has two values. An ER7 encoding of a value of type PID is :+ PID|123-45-6789|{John|Doe}|18|[John Q. Public|Mr. X]|[{home|(212)123-4567}|{pag er|(701)111-1234}] :- Notice that the basic structure of a segment is the same. First, the segment identifier is transmitted, then the fields follow, separated by vertical bars. Composite values are wrapped in curly braces, and the subcomponents are separated by vertical bars. A list is wrapped in square brackets, and the list elements are separated by vertical bars. SS. Incompatibility with ASN.1 The actual ASN.1 keyword for defining record-like things is SEQUENCE. The problem is that in HL7 we need at least three SEQUENCE like things: MESSAGES, SEGMENTS, and RECORDS. In HL7, they have different representations. ASN.1 has no notion of segment, and all record-like values are represented uniformly. It is essential that HL7 specifications explicitly distinguish messages, segments and compound values. Therefore, I recommend that we use \t{RECORD} in our discussions, and in our formal definition! We can transliterate from \t{RECORD} to \t{--RECORD-- SEQUENCE} in machine processable files. The double dashes (\t{--RECORD--}) define a comment, and so ASN.1 processors will ignore the RECORD, and will just see the SEQUENCE. I feel the dashed form is too ugly for writing message definitions. A trivial C program can automatically transform \t{RECORD} into \t{--RECORD-- SEQUENCE}. A similar non-semantic change is needed to accomodate SEGMENT and MESSAGE. A more essential change seems required to support CHOICEs. (See below. The problem is that we need to define a tag for each branc h, and ASN.1 assigns internal branch id's itself.) And personally, I would use \t{LIST OF} instead of the more pedantic \t{SEQUENCE OF}. If you are going to use \t{RECORD}, then you'll have to use a transliterator, and we might as well let it translate \t{LIST OF}! Pure a pure ASN.1 definition would be: :+ - -- Pure ASN.1 sample DEFINITIONS ::= BEGIN PatientId ::= String; PersonName ::= --RECORD-- SEQUENCE { first String, last String }; PID ::= --SEGMENT-- SEQUENCE { id PatientId, name PersonName, roomNr INTEGER, aliases SEQUENCE OF String, tels SEQUENCE OF --RECORD-- SEQUENCE { kind String number String }, } END; :- S. The ER7 Encoding Rules In this section, we will consider the following aspects of ER7 and their corresponding ASN.1 constructs: 1. Compound Values 2. Primitives and SubComponent Delimiters 3. Lists 4. Choices 5. Messages 6. Tables 7. Optional 8. Annotations (Value tagging) (Not included here.) SS. Compound Values ER7 field values may be arbitrarily nested. Where HL7 uses a fixed set of subfield delimiters (caret and ampersand), providing for only limited nesting of values, ER7 wraps the entire field with curly braces (\verb@'{}'@) and uniformly uses the vertical bar (\verb@'|'@) to separate fields and subfields at all levels. This permits the message designer to apply data abstraction without worrying about running out of subfield delimiters. Thus compound values are written \verb@|'{John|Doe}|'@ instead of \verb@'|John^Doe|'@, and, instead of :+ A|B|a^b^c1&c2^d|D|E| :- ER7 would have :+ A|B|{a|b|{c1|c2}|d}|D|E| :- The ASN.1 construct for defining compound values the \t{SEQUENCE}. They are just like Pascal RECORDs. In ASN.7, we call them \t{RECORDS} :+ Address ::= RECORD { street_address String, other String, city String, state String, zip String, country String, }; :- We can use \t{RECORD} to describe specific coded values. :+ CodedPatientId ::= RECORD { id String, name String, dob DateTime }; :- ASN.1 allows you to nest structured values within structured values. Arbitrary nesting encourages you to define and reuse abstractions of complex values such as PersonName. Nested values are especially useful in combination with LIST or CHOICE constructs. \i{Analysis}: The nested curly braces do not introduce any complexity to a parser. At each point (for example after parsing \verb@A|B|@ and being about to start on the value of the third field) the parser knows the type of value to expect. Under HL7, it knows to look for something with \verb@'^'@'s in it. Under ER7 it knows to expect a \verb@'{'@. ER7 does lengthen the encoding of compound values. Wrapping the value in in braces introduces two characters. It looks possible to change the encoding rules as follows. o. Compound values whose nesting does not exceed 2 can use \verb@'^'@ and \verb|'&'| as component delimiters. o. Parsers will be required to accept curly-brace delimited fields even if sub-component delimited ones were permissible. P. Encoding Efficiency It is possible to combine braces and caret encoding. We could decide to allow either encoding, provided it was unambiguous in context. If a parser is expecting a compound value, and it does not see a left-curly brace, then it assumes the value must be encoded with carets. It becomes the parsers responsibility to realize that subcomponents of caret encoded fields must use ampersand. I have implemented such a parser, and it is not hard. Notice that these definitions :+ Name ::= RECORD { first String, last String }; changed_name ::= RECORD { oldName Name , newName }; :- encode legally as :+ |{{John|Doe}|{Frank|Carubba}}| |John&Doe^Frank^&Carubba| |{John^Doe|Frank^Carubba}| |{John^Doe|{Frank|Carubba}}| :- but not as :+ |{John|Doe}^Frank&Carubba| :- If an outer field is caret encoded, the first field must be caret encoded, but the other fields can be curly encoded. SS. OPTIONAL and REQUIRED A field marked OPTIONAL does not have to be transmitted. It is up to HL7 to determine the whether fields are OPTIONAL or REQUIRED by default. ER7 handles \t{nil} and \t{null} values exactly as HL7 does. Thus, \t{nil} is :+ |""| :- and \t{null} is :+ || :- \t{REQUIRED} is Aside: I understand \t{nil} I'm queasy about \t{null.} . It I understand \t{not applicable}, \t{don't know}, and \t{same as last time.} It's not obvious to me that we want a single encoding value (namely \verb@'||'@) to mean all three \t{null}'s. SS. Primitive Values Suppose ISO decides that \t{PersonName} will be a string with \verb@'^'@ component delimiters. ER7 could manipulate \t{PersonName} in ISO format. It would simply treat them as a string, ignoring the fact that it has internal structure. Similarly, if a \t{TN} is defined to have 'NN(xxx)xxx-xxxx', format, ER7 will just pretend it is a string, and won't be bothered by the internal structure of the string. ASN.1 has no facilities for declaring that a type is a specially formatted string. In ASN.1, you must say :+ TM ::= [APPLICATION 32449] VisibleString -- Time. Always in the format HHMM[SS][+/-ZZZZ]. TS ::= [APPLICATION 32447] VisibleString -- Time stamp. Always in the format YYYYMMDDHHMM[SS][+/-ZZZZ]. :- P. A More Radical Extension For values like TN, which may be defined by ISO to include \verb@'^'@ separated subcomponents, we might choose two techniques to explain the structure, rather than leaving the explanation as a comment, as was done in the \t{TN} example, above. We might define another specialization of SEQUENCE, call it ``\t{COMPOUND\_VALUE}'', which is like a RECORD, but whose fields are \i{always} separated by carets. Or, more simply, include in our HL7 definition an semanticaly equivalent ASN.1 type definition. such as: :+ TN ::= String ; -- Use TN from Iso12345.678 TN_REC ::= RECORD { countryPhoneCode NM OPTIONAL, areaCode NM OPTIONAL, phoneExchange NM, phoneBase NM, phoneExtension NM OPTIONAL, beeper NM OPTIONAL, commentTelephone TX OPTIONAL }; -- [NN][(999)]999-9999[X99999][B99999][C any text]. -- X is an extension, B a beeper code, and C comments such -- as "After 6:00". :- Notice that TN's and TN\_REC's encode differently. :+ |[NN][(999)]333-4444[X99999][B99999]| |{01|800|999|333|4444} :- SS. Lists List values use square brackets to surround the entire list, and the vertical bar to separate elements. :+ |[value1|value2|value3]| :- A list of structured values will look like: :+ |[{v11|v12}|{v21|v22}]| :- In ASN.1, you can can declare fields to be lists with: :+ .. aliases LIST OF PersonName, .. :- ASN.1 also allows for you to define ``in place'' lists of anonymous compound values, as in the \t{tels} field. :+ FancierPID ::= RECORD { PersonName name, tels LIST OF RECORD { kind String; tel TelephoneNumber} }; :- In this case, the \t{tels} field contains a list of two component values. Each element will have a \t{kind} field describing what kind of phone number it is, and then the actual value. :+ PID|{John|Smith}|[{home|(123)456-7890}|{beeper|(800)321-4323}] :- P. Byte Saving Just as caret encoding can be used , so can we use tilda encoding. In the case of lists, there are no special problems. We could just go ahead and encode all lists with tilda repetition separators! So, why bother with '[' in the first place? (You don't have to escape tilda's in strings?... pretty weak!) With both curly braces and square brakets, the encoding makes a distinction between a record with one value and just a single value, and between a one element list and a list with one value. Such precision is nice for discussion, but not needed in encoding. SS. TABLES The HL7 TABLE is well supported by ASN.1 enumerated types. Each table element is defined three things 1. An identifier 2. the value which will actually be transmitted in to encode this identifier (optional), and a 3. description of the value (a comment). In ASN.1, you can say :+ Sex ::= ENUMERATED { male(m), female(f) }; :- The type \t{Sex} has two values, which are encoded by 'm' and 'f'. (Simple!) SS. Choices Pascal variant records or C union data types can be expressed in ER7. The idea is that a variant record is defined by a set of named branches, and each branch has a different type. Consider the type \t{dx}, which has a field ``details'' which can hold either an integer, or a pair of strings. :+ PersonName ::= RECORD { first String, last String }; dx ::= SEGMENT { name String, details CHOICE { foo Int, bar RECORD{ a String, b String }, baz PersonName } size Int, } :- The idea of encoding CHOICE's is that you first send an indicator as to which branch is valid followed by the values of that branch. The obvious indicator to use is the identifier of the branch. There are two approaches to encoding CHOICE values. You can either be a purist, and say that the CHOICE is a two component structure, the first is the discriminator, and the second is the encoding of the branch type. :+ dx|John Doe|{foo|3}|7 dx|Mary|{bar|{xyz|wus}}|8 dx|Steve|{baz|{Bill|Clinton}}|9 :- Or, you can try to imitate segements, and concatenate the discriminator with the value of the branch. :+ dx|John Doe|{foo|3}|7 dx|Mary|{bar|xyz|wus}|8 dx|Steve|{baz|Bill|Clinton}|9 :- This second approach may save some encoding bytes, but interferes with parsers which want compound values to always begin with a curly brace. For example, with encoding style two, a \t{PersonName} usually looks like \verb@{Bill|Clinton}@, \t{except} inside choices, when it doesn't have the leading curly brace! NOTE: If we allow caret separated components for shallowly nested structures, then :+ dx|John Doe|foo^3}|7 dx|Mary|bar^xyz&wus}|8 dx|Steve|baz^Bill&Clinton}|9 :- would be legal, and so might :+ dx|Steve|baz^Bill^Clinton}|9 :- For encoding efficiency, we may not want to transmit the entire branch identifier as the discrimination indicator. Rather, we might want to let the message specification define a shorter name to use. The example below Suppose you had an order for a prescription which needed an authorization if the requesting physician did not have privileges at this hospital. :+ Doctor ::= String; DeaNr ::= String; DateTime ::= String; PrescriptionOrder ::= RECORD { dr Doctor , authorization CHOICE { privileged RECORD{dea DeaNr }, unprivileged('u') RECORD { authorizingDr Doctor, authorizingTime DateTime} } drugName String }; :- Now, suppose Dr. Casey is privileged, and Dr. Bob is not. Dr. Casey could request tylenol with :+ {Dr. Casey|{privileged|87-65-43}|Tylenol} :- You can see that the second value of the \t{PrescriptionOrder} is represented by the compound value \t{\{TAGNAME|VALUE1|VALUE2\}}. The \t{tagname} is the identifier used in ASN.1 for this branch, the the \t{value} is just a single value of type \t{DeaNr}. Dr. Bob would request Tylenol with :+ {Dr.Bob|{u|{Dr. Casey|3pm}}|Tylenol} :- In this case the tag is encoded with the abbreviation ('u') which given in the ASN.7. Let's define a complex Order segment to show off the power of ASN.1 Each ORD segment applys to a list of patients. For each patient, there is a list of items which have been ordered for them. :+ ORD ::= SEGMENT { orders LIST OF RECORD {who CodedPatientId, items LIST OF PrescriptionOrder } }; :- If the two prescriptions were ordered for John Doe, the segment might be: :+ ORD|[{123|John Doe|10Apr45}|[{Dr.Bob|{u|{Dr. Casey|3pm}}|Tylenol}| {Dr. Casey|{privileged|87-65-43}|Tylenol}]] :- As you can see, segments can easily and comprehensively express complex relationships, and therefore their encodings can become large. Such complex relationships are expressed in HL7 2.2 by repetitions (and regular expressions) segments. could investigate Notice also that a CHOICE value is encoded in a similar way to SEGMENTS. Each begins with a tag, and is followed by a bar-separated set of values. Notice that ML7 doesn't explain which branch of the choice is \i{supposed} to be present. We leave that up to the semantics. The rules which determine whether a doctor is in fact privileged may be complicated. The receiving system may have to validate that a message containing the privileged branch does name a privileged doctor! Presumably, in a data model, you could express more of the semantics. SS. MESSAGES An ER7 \t{MESSAGE} is truly identical to an HL7 message. It contains SEGMENTS, with iteration and optional segments allowed. For example, take the quite simple A01 message. :+ ADT_AO1 ::= MESSAGE { msg MSH, evn EVN, nk1 NK1, pv1 PV1, dg1 DG1 OPTIONAL }; :- A MESSAGE definition is record-like, but all the ``fields'' must be SEGMENTS, or be RECORDS, LISTS or CHOICES of SEGEMENTS. The ability to name segments comes in handy. Consider Swap Patients (\t{ADT.A17}). Using the existing HL7 specification machinery, you write :+ MSH EVN { PID PV1 } :- and HL7 cannot express the fact that these come in pairs. ML7 would say: :+ ADT_A17 ::= MESSAGE { msg MSH, evn EVN, pt1 RECORD {pid PID, pv1 PV1}; pt2 RECORD {pid PID, pv1 PV1}; }; :- MESSAGEs support iteration. :+ ORF ::= MESSAGE { msh MSH, msa MSA, result_sets LIST OF { original_query QRD, filter QRF OPTIONAL, pid PID OPTIONAL, ntes OPTIONAL LIST OF NTE, results LIST OF { orc OPTIONAL ORC, obr OBR, ntes OPTIONAL LIST OF NTE, observations LIST OF { obx OBX, ntes OPTIONAL LIST OF NTE, }, } } } :- CHOICE's within MESSAGES are handled a bit differently. The tag specified by the choice is ignored, and you can tell which branch is being sent by looking at the type of the first segment of the branch. Thus, each branch must begin with a different segment type. We could use the following notation (the following ``hack'', if you would), wherein if we give a value to the tag (\t{FEE} in the example below), then a segment with only the SEGMENT ID in it will be transmitted. Such a segment is only used to disambiguate the parse. The definition :+ ZOO ::= MESSAGE { msh MSH, v CHOICE { bar BAR, baz RECORD{baz BAZ, gam GAM}, fee(FEE) RECORD{bar : BAR, extra : EXTRA} }; xyz : XYZ } :- describes this 'foo' message :+ MSH|....|ZOO|... BAZ|.... GAM|... XYZ|... :- and this 'fee' message (which has a degenerate FEE segment before a BAR which is part of the 'fee' branch. :+ MSH|...|ZOO| FEE BAR|.... EXTRA|... XYZ|.... :- S. Appendix SS. Divergence from ASN.1 As mentioned above, the proposed ASN.7 diverges from ASN.1, because we need sp ecialized records. Translate to \t{SEQUENCE}: RECORD, MESSAGE, SEGMENT. Translate to \t{SEQUENCE OF} : LIST OF Translate to COMMENT: REQUIRED, (n) [The abbreviation in CHOICE branches.] SS. Fixed Delimiters I have presented ER7 as using fixed delimiters, but this is not essential. It seems to me that even when they are redefinable, anyone encoding a string for transmission will have to look through the data string to see if there are any delimiters that must be escaped. So, if you are escaping delimeters anyway, why not use fixed delimiters? Then your parser can be made faster! But this is a small point on which I will not insist. P. Possible Extension to TABLE : OPEN vs CLOSED We might want to consider adding the notion of OPEN and CLOSED enumerations. If a table is \t{CLOSED}, then new elements cannot be added by implementors. Open tables can be extended with the \t{EXTEND} clause. \t{EXTEND} allows other tables to be ``incorporated by reference.'' The following example shows how the HL7 standard defines an OPEN table, which is extended by the local site. :+ // The HL7 Standard defines this. HL7-Religion ::= OPEN ENUMERATED { Atheist(A) , // ... Christian_Scientist(S) }; // The UniversityMedicalCenter extends the table // as follows: Religion ::= EXTEND HL7-Religion { Hindu(H) }; :- Analysis: This is clearly the first step towards inheritance, and definitely semantic processing of ASN.7 to yield ASN.1. Therefore, while I like the idea, it's probably not worth the trouble. hide:+ S. Observations on Implementation Considerations for ER7 This section is mainly questions to implementors. o. I don't really know what advantage you get from redefinable delimiters. You have to scan the string anyway to quote possible occurrences of the delimiter character. By making it fixed: o. Applications can use finite state machines a bit easier, o. Tools won't have to worry about the delimiters being changed. o. It would be possible to store fully escaped values in your data base in the format needed to transmit them over the wire. o. To parse a field, the implementation treats the first character of a field (the one right after a vertical bar) is magic. o. If it is a hash, then the field has an annotation. o. If it is curly brace, then the field is structured. o. If it is curly brace, then the field is structured. o. If it is a DoubleQuote, and the second character is a DoubleQuote then the field represents NIL. o. If a field is a structured one, then it will expect to see the left curly brace. o. String values must be scanned for delimiters and special characters before they go out on the wire. o. Bar or RightCurly and right SquareBrackets must be quoted if they appear anywhere in the string. o. The first character must be quoted if it is a Hash or LeftCurly, LeftSquareBracket or a Dquote. hide:- \newpage S. HL7 Example hide:+ SS. Example In the example below, you will see examples of the following features: o. \verb|PRIMITIVE_VALUE| -- Defines a new ``built-in'' type. o. \verb|TABLE| -- HL7 tables become ML7 \t{TABLES}, which are fancy Pascal \verb|ENUM|'s. TABLE values are defined by \verb| = ``Descriptive String'' ;| The \t{~} can be omitted. o. \verb|COMPOUND_VALUE| -- Are modeled after HL7 fields with \verb |^| subfield separators. o. \verb|SEGMENT| -- A Pascal RECORD, whose fields are PRIMITIVE\_VALUES or COMPOUND\_VALUE. It is identified by a SEGMENT name field at runtime. o. \verb|MESSAGE| -- A record whose fields are SEGMENTs. \begin{verbatim} PRIMITIVE_VALUE ZipCode = String; PRIMITIVE_VALUE DateTime = String; COMPOUND_VALUE PersonName { given : String; middle : String; family : String; prefix : String; suffix : String; degree : String; }; COMPOUND_VALUE Address { street : String; town : String; county : String; state : String; zip : ZipCode; }; COMPOUND_VALUE TelephoneNumber { countryCode : Integer; areaCode : Integer; phoneNr : Integer; extension : Integer; beeper : Integer; comment : String; }; TABLE CheckDigitScheme = { M10 ``Mod 10 Check Digit Scheme'' , M11 ``Mod 11 Check Digit Scheme'' }; COMPOUND_VALUE CK { val : String; check_digit : String; check_digit_scheme : CheckDigitScheme; }; TABLE Sex = { // Example of Redundant comments! Male = M ``Male'' , Female = F ``Female'' , Other = O ``Other'' , Unknown = U ``Unknown'' }; TABLE EthnicGroup { Native_American = A , Black = B , Caucasian = C , Hispanic = H , Oriental = R }; TABLE Religion { Atheist = A , // ... Christian_Scientist = S } \end{verbatim} hide:- \newpage \begin{verbatim} PID ::= SEGMENT { set_id SI, ext_pid CK, int_pid CK, alt_pid String, name PersonName, mother_maiden_name String, dob DateTime, sex Sex, alias PersonName, ethnic_group EthnicGroup, address Address, country_code ID, home_phone TelephoneNumber, business_phone TelephoneNumber, language String, marital_status MaritalStatus, religion Religion, acct_nr CK, ssn String, drivers_license String, }; AdtTypeCode ::= ENUMERATED { AO1 --``Admit a Patient'' // ... P03 -- ``Post Detail Financial Transaction'' }; AdtReason ::= ENUMERATED { patient_request(01), physician_request(02), census_management(03) } EVN ::= SEGMENT { type_code AdtTypeCode, time DateTime, planed_time DateTime, reason AdtReason, } A01 ::= MESSAGE -- ``Admit a Patient'' { msh MSH, evn EVN, pid PID, nk1 NK1, pv1 PV1, dg1 OPTIONAL DG1, }; ADR ::= MESSAGE -- ``ADT Response'' { msh MSH, msa MSA, qrd QRD, results OPTIONAL LIST OF { evn OPTIONAL EVN, pid PID, pv1 PV1, }, dsc OPTIONAL DSC, }; \end{verbatim} - ----------------------------------------------------------------------- Mark Tucker email: mct@philabs.philips.com Philips Laboratories, tel: (914) 945-6564 345 Scarborough Road,Briarcliff Manor,NY 10510 fax: (914) 945-6552