This is Info file ProtoGen.info, produced by Makeinfo-1.64 from the input file ProtoGen.texi. This text describes the implementation of HL7 that is being done at the Universitätsklinikum Steglitz in Berlin. It is meant as a report about the work in general as well as a manual for the software that is about to be developed. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. Copyright (C) 1994, 1995, 1996 Gunther Schadow  File: ProtoGen.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir) ProtoGen/HL7 -- An Implementation of Health Level 7 *************************************************** This paper describes the attempt to implement HL7 at the Universitätsklinikum Steglitz in Berlin as part of a larger hospital communication project. The aim of that project is to integrate of heterogeneous computer systems which exist in the various departments of the clinic. This project requires a standardized communication protocol. The standardization of medical communication protocols is an ongoing effort which is taken on by many different standard groups. Naturally the diverse standard groups have different scopes depending on the special field where each standard has it's origin. Since late 1990, the US standard groups have been associated in the Healthcare Informatics Standards Planing Panel (HISPP) which is coordinated by the American National Standard Institute (ANSI). It's goal is to promote the convergence of the different standards, a process which is still at the beginning. However, as early as 1987 there was one standard group whose objective was to cover a wide range of data interchange in health care, including ADT, Finance and a variety of anciliary fields, like laboratory medicin or radiology. In Europe, the Technical Board of the European Standardization Commitee (CEN) established the Technical Committee for Medical Informatics (TC 251) in March 1990; one of it's Working Groups (WG.3) is dedicated to health care communications. Since the implementation of HL7 as presented here is embedded into a project that is seeking the integration of different standards it is conceptualized from a point of view that may be different from those of other implementors who may concentrate on mere HL7. The difference will become evident in the methods taken to build the implementation itself as well as in the final linkage between the communications protocol and the application software. Where it seems appropriate for the implementation which is centered on HL7 to stay completely within the concepts and terminology of HL7, this project seeks to go towards further generality in order to provide concepts and methods to serve as bridges between diverging standards, which are to be released in the same field. An implementation of a communication standard is essentially a data structure, a builder and a parser. The data structure holds the data objects and reflects their relations among each other as assumed by the standard. The builder is to produce a valid representation of the data structure that is capable of being transported via electronic data interchange media. Finally the parser transforms this representation back into the data structure. This implementation tries to assume models and methods of todays informatics technology. It's design is object oriented and is implementeted using C++. The code of the implementation is produced by a compiler whose input is a database which specifies the standard. The database is in turn compiled semi-automatically by scanning the text of the HL7 standard as is released by the HL7 Working Group. This method of generating the implementation directly from the textual description has the following advantages: correctness and customizability of both, the parser/builder methods and the definition of the standard itself. Since the specifying database as well as the program code which implements the standard is not written manually but produced by a compiler, the output is assured to be correct, unless the compiler itself is incorrect. An incorrect compiler, however, will result in erroneous code but the errors tend to occur systematically rather than accidentially and are thus discovered more easily. Thus the method of ProtoGen/HL7 can be used to generate a HL7 reference implementation. On the other hand, it does reveal problems and errors in the standard definition itself, since a compiler will not guess the intended meaning of an amiguous or obviously incorrect passage as a human programmer would probably do. The input database for the ProtoGen/HL7 compiler is customizable and allows the user to add new data elements quite easily. For example, a special Z-segment that is used by a site can be added to the standard just by adding a single line to the database. The implementation for this special segment is generated fully automatically as is the implementation of the whole standard. On the other hand, the methods of the parser/builder may be customized in order to provide for different encoding rules or to speed up the implementation or tune it for size. These customizations are, however, not done very often and can currently only be done with a deeper understanding of the compiler itself since this requires the compiler program to be modified. Nevertheless, there are various applications that can be worth the effort ranging from modifications to HL7 coding or processing rules up to the implementation of a completely different standard. Since the UN EDIFACT encoding rules are quite similar to those of HL7 there could be an EDIFACT module added to ProtoGen with only a little to moderate effort. Finally, the object oriented model of the HL7 standard that this implementation assumes helps to stay compatible with the new developement in the HL7 Working Group (version 3.0) in particular as well as health care data interchange standards of the future in general, such as MEDIX. * Menu: * Introduction:: * From Text to the Database:: * The HL7 database:: * Generating C++ code:: * Integration into the system:: * Bibliography::  File: ProtoGen.info, Node: Introduction, Next: From Text to the Database, Prev: Top, Up: Top Introduction ************ A communication protocol for medical applications tends to be rather complex, since medical communication includes a variety of diverse issues, such as ADT (admission discard transfer), accounting, reporting and documentation of findings, querying and ordering etc. In order to cope with this kind of complexity it is necessary to find methods of deriving the implementation from the specification of the standard automatically. Besides the problem of complexity, modern software engineering has pointed out the need to automate the production and management of software. These concerns are reflected in the modern technologies by which communication protocols are implemented today, and which are formulated by the OSI standard. In this Introduction we want to take a first look on HL7 and this implementation of it, concerning as to how we can cope efficiently with the complexity mentioned above. * Menu: * A view on HL7:: * A view on this implementation::  File: ProtoGen.info, Node: A view on HL7, Next: A view on this implementation, Prev: Introduction, Up: Introduction A view on HL7 ============= HL7 is a communication protocol for medical applications that tries to conform to the OSI concepts of networking. Particularly, HL7 even attaches it's name to the OSI reference model (Health Level 7) therefore it should be expected, that HL7 will easyly comply to the concepts and methods that are suggested by OSI. Indeed, the distinctions between abstract syntax and encoding rules, are reflected in the HL7 document: For each message, an abstract message definition is given, consisting of 3 parts, which are logically related to the elements which are considered by HL7 to constitute transactions: (1) messages, (2) segments and (3) fields, `data elements' or types. The transactions (messages) are conceptually embedded into their environment in which they occur in terms of `trigger events'. On the other hand, there are encoding rules defined in order to have an interim standard, until the OSI suite of protocols is sufficiently supported. Finally there are several lower layer protocols (LLP) specified, which belong to the session and transport layers, and which include or use in turn resources, which are associated with again lower layers. These various protocols all meet different needs, which arise from the use of different transport media and their properties such as reliability. However in the proceeding of this implementation attempt it turned out, that the HL7 specification doesn't make the distinctions above as strict as it claims to. This poses problems to the implementor, who tries to go with OSI. We will point out these problems in this paper at the place where they become evident (*note On the abstractness of abstract syntax in HL7::.) The unique notation used in the HL7 document is another problem, that uneases the use of OSI standard methods. Probably much of the problems of HL7 could have been avoided, if a standard notation such as ASN.1 would have been taken over in time. This problem may be partialy come from kind of a misinterpretation of ASN.1: In the HL7 document it is repeatedly stated, that the scope of ASN.1 be the basic encoding rules, however ASN.1 is an abstract syntax notation, which is more appropriate for comparison with the `abstract message notation' used by HL7. The encoding rules stay completely hidden in ASN.1 (in opposite of HL7 message definition). It seems rather likely, that ASN.1 was primarily avoided because of the difficulty "to explain to application programmers, who are not schooled in recursive languages in general, or ASN.1 in particular". This is certainly an anachronism, since tools like lex and yacc are widespread and extensively used since many years. However, to be fair, we have to note here, that HL7 was first documented back in 1987 when the ASN.1 standard was just released.  File: ProtoGen.info, Node: A view on this implementation, Prev: A view on HL7, Up: Introduction A view on this implementation ============================= Today, there exist some tools which are dedicated to the OSI concept, most notably the ISODE, which is an almost complete development environment for OSI applications. Thus it seemed at the first glance appropriate to transform the HL7 notation to ASN.1 and feed the latter code to the various tools of the ISODE including the structure generator (pepsy) and the remote operation stub generator (rosy). However there is one single point which inhibits this rather convenient way to go: ISODE supports only the OSI basic encoding rules (BER). Furthermore there is currently no other ASN.1 compiler, which supports the HL7 encoding rules. Even though, the HL7 encoding rules were meant as an `interim standard' for use until OSI standards are available, it is now a major point that makes it hard for the HL7 world to migrate towards OSI. Most existing HL7 applications use the HL7 encoding rules, and for compatibility, new HL7 applications will keep doing so. Because HL7 encoding rules have several disadvantages (KUPERMAN (1991), p. 179) which will cause them not to survive for many years, adding support for HL7 encoding rules into existing ASN.1 compilers (ISODE or SNACC), though possible, seems not to be worth the effort. This implementation has developed (and is still about to develop) its own means by which the program objects are automaitcally derived from the documenting text of the standard. Since the author of this paper happened to have the text of the HL7 specification in ASCII files, there was a reasonable chance to go this way from the very beginning. Before we start going along that way step by step, we'll first take a look on it from above. Figure 1 shows the process of generation of the implementation. Let us start from the end, and answer the question where we want to go: We want to end with a set of definitions which are expressed in a common programming language plus an object library, which provides the actions corresponding to the definitions (the implementation in the closer meaning). This shall be further usable by applications to be built or to be extended with the HL7 interface. There is a need of compatibility in terms of platforms on which the programs are to be running if this implementation shall once pay off its effort. The programming language that was cosen is C++. The important advantages of C are its widespread propagation on small computers like PCs as well as on workstations and minicomputers along with the naturally embedded operating system interfaces especially on UNIX platforms provide a base for compatible as well as efficient code. These advantages are shared by C++ since C++ is derived from C. Moreover, C++ provides new and powerful technologies of software design which we want to make use of here. The steps towards the C++ code are taken by various means and methods. First we extract meaningful information from the HL7 document, this is done with AWK. The since most of this information is presented either in tables or in a simple formalism capable of expressing the message syntax, it is appropriate to process these information further by a method which is capable of handling both, relations and grammars. Prolog on the one hand expresses information in terms of relations, on the other hand it is powerful (and often used) for handling grammatical data. Moreover, analysis and generation of corresponding data can be mutually achieved by the same code. The tasks that are taken over by Prolog are the check for consistency of the information gathered from the document, and the final generation of the C++ code. The C++ code is then compiled and packed into an object archive (a library). The C++ definitions in the header (`*.h') files provide the Interface for application programs, that will make use of HL7. It should be possible to provide availability to C programs (and programs of other programming languages also). Even though, we try to provide an automatical tool of code production from the standard's document which is reasonably general to tolerate future changes of HL7, we however do not intend to make a general purpose compiler for the notation in which HL7 is defined. Remember, that this notation is mainly an unstructured text, which although contains machine extractable structured information, which we try to capture. Furthermore the way this structured information is presented, is unique to HL7 and not backed by any standard. So there is little use of a general purpose compiler for that notation. We rather keep the work as specific to HL7 as possible, to minimize work overhead for the sake of unused generality. This includes, that a considerably amount of the code has to be written manually. Again here come the advantages of C++ (over C) for help to minimize software management overhead.  File: ProtoGen.info, Node: From Text to the Database, Next: The HL7 database, Prev: Introduction, Up: Top From Text to the Database ************************* This chapter describes, how we actually scan the HL7 document to capture useful information. First we have to take a look on the HL7 document itself, its format and what structured information we may find therein. Then we describe the interim format, in which all information that can be expressed in tables is stored. Finally we describe the methods used to generate Prolog predicates from these tables. However as noted above, tables are not the only information style in which information is presented in the HL7 document, the message syntax is expressed in a simple formal language, that can be translated purely lexically to Prolog expressions. To accomplish the tasks of this step, we use AWK, which is a common tool for text file processing, available on virtually any UNIX system. So is sed, which we sometimes use to post process our output in order to get rid of nasty junk characters. Where AWK or sed is not available, the C sources of the GNU version of these (strongly recommendable) tools can be freely copied from any major FTP server next to you. * Menu: * The HL7 document:: * The Intermediate Table Format:: * Extracting tables:: * Extracting segment definitions:: * Generating Prolog predicates from tables:: * Extracting message definitions::  File: ProtoGen.info, Node: The HL7 document, Next: The Intermediate Table Format, Prev: From Text to the Database, Up: From Text to the Database The HL7 document ================ This chapter may be quite specific for the kind of text files that was available at the time, the work on this project started. Namely there were 11 Microsoft Word 5.0 files, one for each chapter (files `kap[1-7].txt') or appendix (files `anh_[a-d].txt'). The chapter files can be processed all the same way, while the appendices require a unique handling of each file. Appendix C can be treated like a chapter file. Since appendix B describes the lower level protocols and is completely different in both format and purpose, we will ignore it here until we come to the point, where we implement the lower layer protocols. Appendix D is an automatically (?) generated index, which we don't need at all. Before we start discussing the treatment of the chapters, only appendix A needs to be looked at more closely. Appendix A is a summary full of tables, which presents a most welcome source of various information for us. This appendix was first treated manually. It was easy to convert all the tables into the interim table format described in the next section. Actually the latter format was developed while manually working on the appendix A. Since this is so, the appendix A was broken up into well formatted tables even before any AWK script was written which would do the work of extracting tables, there still exists no automatic extraction script, which could be applied to appendix A. Once our general method has proved to be useful and appropriate for the next version of HL7 as well, this gap will be filled up. Fortunately, the format of Microsoft Word 5.0 files is merely an ASCII text file, with only a few special characters written directly into the text. The big bunch of information on the print format is appended binary to each text file. Even though, this information is completely obscure for us, it can savely be ignored. What remains open to us and is sufficient, is the ASCII text. Thus we don't have to bother with all the layout information like indentations, fonts and character styles etc. Now, what information do we have to expect, and how is it presented in the ASCII text of our files? Each chapter consists of sections, which describe messages, segments or fields. The sections are recognized by their specific layout (e.g. tables) and by keywords, which have proved to be very helpful for us here. It can even be stated, that the strict usage of keywords (e.g. `FIELD NOTES:') in the text made this whole work at all possible. We'll describe the extraction of of the items in the appropriate section below. As you can easily see from the above filenames, these were not the original files, which would for sure have english acronyms. In deed, these files were being worked on by the German section of the HL7 consortium. This (visible) work consists merely of annotations mainly by German translations of the terminology, sometimes there have been some remarks added. However, this would have been only of little interest for us, if this efforts had not caused the format of the files to mess up. This in turn has been the cause of severe problems, some of which could not been solved satisfactorily by a general mechanism, with the consequence of having to manually edit the files that were produced defective. There would have been little sense of waisting time trying to solve a problem which is hoped to disappear in the next version of HL7 or even in the next set of textfiles we may have available. It became quite obvious that a kind of a WYSIWYG text processor like Microsoft Word for Windows -- though easy to use -- is prone to cause severe problems, especially if more than one author is working on the same file. Very much dicipline and a common method (a standard) is required about the usage of resources to format the text (e.g. Print Formats vs. arbitrary formatting of marked blocks). It is not easy to always keep this kind of discipline if one can achieve ones immediate ends rather simple. Moreover the format of a text in a WYSIWYG editor tends to become obscure, even though the writer feels to have complete control over it, this gives rise to surprises. After all, it seems that text processing methods which appear rather unfriendly or outdated on the first view (like *roff and TeX), do pay off the extra effort in the long run, particularly on long texts and when used in a work group.  File: ProtoGen.info, Node: The Intermediate Table Format, Next: Extracting tables, Prev: The HL7 document, Up: From Text to the Database The Intermediate Table Format ============================= This section describes how we store most of the information drawn from the HL7 text files. Since there are different items to extract (i.e segments and tables) which however are essentially tables, but differ in their embedding and appearance in the text, it seems appropriate to have an easy to generate interim table format, which the different extraction procedures write their information into, rather than generating the final representation directly. From these tables we can get our final representation easily by applying a common AWK script to all of them regardless of what they are derived from. This has the advantage that we are free now to decide to translate them into a different programming language than Prolog or import them into a database management system etc. Note that if we are talking about `tables' in this section, we mean this interim format, do not confuse this with the tables found in the HL7 text. The following is a set of rules telling us how these tables are built: 1. The first line of the table is it's name. 2. The table is ended by at least one empty line. 3. Each row of the table makes up exactly one line, the length of the line is however not limited to a specific number of characters. The line is terminated by the system's native character, i.e. the one that the AWK's printf escape sequence `\n' expands into. 4. Columns are colon separated. A colon appears *between* two columns, not starting the first and not ending the last one. Where there are two consecutive colons, a colon at the beginning or one at the end of the row, the corresponding field is treated as empty. 5. The second line of the table names the titles of the columns. 6. The third line of the table specifies the data type to expect on each column. This can be one of the following keywords: `sym' a symbol, that will become an identifier in the target language (Prolog). `num' a number `str' a string, i.e. a sequence of characters, that will appear enclosed by string delimiters `"'. 7. The fourth line of the table is the first row of data of the table, this and any immediately following line will be treated as table data. However, there are more complex tables, which contain subtables, all of the latter have the same format (e.g. number and types of columns). These complex tables are generated e.g. from table of `TABLE VALUES'(1), which appears in the appendix A. The complex tables are basically the same as described above. Notably the rules 1-6 of the definition above do still apply. Here are the other rules which apply to the complex tables: 7. the forth line must start with `-' and defines the titles of the columns of every subtable. 8. the fifth line must also start with `-' and declares the data type of the columns of every subtable. 9. any other line that is not preceded by a `-' starts a new subtable. 10. any other line that is preceded by a `-' is a row of the subtable that started recently. The meaning of the rows is slightly different or extended from those of the simple table. The idea is, that we generate two relations, from the complex table. One is the main table (i.e. the table that results if we delete any line that starts with an `-'), while the other is a relation, that is constructed from the main table and the subtables. Let R = {t1, ..., tc} be a relation of cardinality c, where each tuple is t = for i running from 1 to c. R corresponds to the main table. For each ti there is a relation Si(si1, ..., sim) which corresponds to a subtable. If ri1 is a key to R then T(ri1,si1, ..., sim) is a relation which is equivalent to S, and which corresponds to the second relation we produce from the complex table. To give examples rather than exhaust the reader with definitions, first we present a simple table: DATA TYPE DATA TYPE:DESCRIPTION:LENGTH sym:str:num AD:ADDRESS: DT:DATE:8 ... PN:PERSON NAME:48 TX:TEXT: Here is an example for a complex table: TABLE TABLE#:DESCRIPTION num:str -VALUE:DESCRIPTION:VALUE# -sym:str:num 0001:SEX -F:Female:000345 -M:Male:000344 0002:MARITAL STATUS -D:Divorced:000350 -M:Married:000348 -S:Single:000349 From the latter table we'll produce two relations, as if they had been defined as follows, first the main table and then the constructed table: TABLE TABLE#:DESCRIPTION num:str 0001:SEX 0002:MARITAL STATUS VALUE TABLE#:VALUE:DESCRIPTION:VALUE# num:sym:str:num 0001:F:Female:000345 0001:M:Male:000344 0002:D:Divorced:000350 0002:M:Married:000348 0002:S:Single:000349 Note from the examples, that the title of the derived table is the title of the first column of it. The filename conventions for these interim tables are not uniform. A simple table ends with `.tb'. However, a table which was derived from a segment definition is named with a trailing `.stb'. ---------- Footnotes ---------- (1) yet another meaning of `table'  File: ProtoGen.info, Node: Extracting tables, Next: Extracting segment definitions, Prev: The Intermediate Table Format, Up: From Text to the Database Extracting tables ================= Most tables which are scattered throughout the chapters are compiled into the appendix A, so they don't have to be rescanned here. Actually there is no table in the chapters, which is not found again in appendix A, but we cannot be sure here. What we do is just to scan for the headings of the tables, in order to merely catch the number of the table. Afterwards, we can check, if there is any more information to get, which we do not already have taken from appendix A. For this kind of extraction we do not even need AWK. We let sed(1) run once over each chapter file with the following command, which appears as the only command in the `bin/chptbl' shell script. sed -n -e "s/^TABLE \([0-9][0-9]*\).*/chapterTable(\1)./p" *.txt This causes any line like the following example TABLE 0002 MARITAL STATUS to be output as chapterTable(0002). The latter is a Prolog predicate, which is then used to check for tables not yet known from appendix A.  File: ProtoGen.info, Node: Extracting segment definitions, Next: Generating Prolog predicates from tables, Prev: Extracting tables, Up: From Text to the Database Extracting segment definitions ============================== Segment definitions are tables as well which describe one field on each row. On the header of the table there is the sequence of keywords `SEQ', `LEN' etc. The tables are again surrounded by empty lines. Following each table, there is a section which starts with the keyword `FIELD NOTES:', followed on the same line with the id of the segment. Even though, we take the segment id from the subsection headline, which precedes the tables, we reassure us from the correctness of our assumption with the help of the `FIELD NOTES:' construct, which turned out to be very reliable. Please note, that we can not define a completely secure method at the outset, by which we find any piece of information without fail, but we rather refine our method as far as seems reasonable by trial and error, trying to catch most misinterpretations by taking advantage of redundancies found in any human readable text. The extraction of segment definitions is done by `bin/exseg', which in turn runs AWK with `bin/exseg.awk' on its first parameter (which typically is a file name), pipes the output through several sed processes and prints its output to the standard output stream (which is typically redirected into a file). We build the interim table with the name `SEGMENT' to be defined and followed by the column specification lines, which are the same for any segment. Then we convert the body of the table into the canonical format, replacing VTs by colons. We are in some trouble here, because some VTs have been mutated to consecutive spaces, thus we cannot savely discriminate columns. The only reasonable way to cope with these kind of problems seemed to manually edit the defect output. It is hoped, that any next version will be free of these problems, such that there is no use wasting time extending generality to cover faulty conditions. There is reason for extracting the field notes as well. We will find, that there are fields declared as being of ID type, which actually are a kind of a composite types. This applies for fields, that deal with patient location. Actually we don't use the field notes a lot, but it might be useful, if we'd store the whole information in a database, or hypertext file. Field notes are extracted with `bin/exfld', which is itself an AWK script.  File: ProtoGen.info, Node: Generating Prolog predicates from tables, Next: Extracting message definitions, Prev: Extracting segment definitions, Up: From Text to the Database Generating Prolog predicates from tables ======================================== From the interim tables we can finally build Prolog predicates. Since Prolog predicates are essentially relations, and relations can be regarded as tables, this conversion is rather straightforward. It is funny, that any formal information, identifiers as well as descriptive strings are printed in all uppercase in the HL7 document. However we turn everything to lower case, because of following reasons: * text in all-uppercase is hard to read. * text in all-uppercase is hard to type. * Prolog likes symbols to start in lower case. The functor symbol is derived from the name of the table, by deleting any special character and replacing consecutive white spaces by one single underscore (`_') character. Even though we could as well surround the whole title of the table by `'' characters, thus marking the sequence as a symbol, we make the conversion for convenience of the working on it to come. Note that we have to refer to these symbols and we don't want to bother with the special characters and number of consecutive spaces, we rather convert the names in a canonical form. It happens quite often, that one or more attributes of a tuple stay undefined, i.e. when nothing is found between two colons. In this case, we set the value to undefined, which we can do in Prolog by using the anonymous variable `_'. We could as well use a specific symbol which will assume the meaning of nil, like `null', `nil', `''' or even `[]'. However, the anonymous variable will be bound to anything during a unification, thus anything will be allowed at this attribute to cause the predicate to suceed. Finally we begin each group of predicates with a few command lines of descriptive information about functor name and arity, column types and column headings. This job is done by the `bin/tb2pl' shell script, which calls `bin/tb2pl.awk', and collects temporary output of `bin/tb2pl.awk'. These temporary files are created during the procession of subtables.  File: ProtoGen.info, Node: Extracting message definitions, Prev: Generating Prolog predicates from tables, Up: From Text to the Database Extracting message definitions ============================== Messages are syntactically defined in the HL7 document using the formal language to be described below in this section. We can recognize message definitions by their unique layout, which comprises 3 columns, normally separated by ASCII VT characters, except for the cases where the format was damaged or originally inconsistent. The first column contains the code in the formal language, the second column contains remarks and the third column contains the Chapter. The first row contains the message id and on it's last column the word "Chapter" or "Appendix" which we use as a keyword. Finally these tables are separated from the rest by one empty line, both at the beginning and at the end. The following is an example of message definition as we find it in the file `kap2.txt', all literal VTs have been replaced by ` - '. WRP - Widget Report - Chapter MSH - Message Header - II MSA - Message Acknowledgement - II { WDN - Widget Description - XX WPN - Widget Portion - XX { [WPD] } - Widget Portion Detail - XX } The syntax of the formal language is as follows ::= ::= | ::= | `[' `]' | `{' `}' ::= ::= ::= ::= | ::= `A' ... `Z' ::= `0' ... `9' At this point we can however ignore the syntax,(1) we rather make the following textual changes: 1. remove any white space 2. change all upper case to lower case 3. any opening bracket (`[') is replaced by `opt(' 4. any opening curly brace (`{') is replaced by `rep(' 5. any closing bracket (`]') or curly brace (`}') is replaced by a closing parenthesis (`)') 6. append a comma `, ' unless * there was recently no token at all * there was recently an opening parenthesis 7. remove any new line character (i.e. print anything on a single line) The first id that was read, i.e. the one that happens to be on the header line of the table becomes the message id. The rest of the first line up to the keyword `Chapter' or `Appendix' will be recognized as the description. However, for the body of the table, we recognize but the first column, which we handle as said above. Things would have been easier, if there would not be some message definition tables breaking these rules. Some tables are formatted without the VT between the columns, which made it very hard to get rid of the other columns, while keeping the integrity of the first column. The definition of each message is then stored into a single Prolog predicate message/4: message(wrp,'',"widget report",[msh, msa, rep(wdn, wpn, rep(opt(wpd)))]). The second argument of message/4 is the event type code, which further qualifies a message type. However, an event type code is not always specified (which we have to discuss later). This code -- if given at all -- can be found in the most recent heading of a subsection. One of the keywords `Event Code' or `Trigger Event' precedes the three character id of the event type code. That one is written as the second argument to the message predicate. We should rather have scanned the recent heading of a subsection for a description of a segment, since this may uniquely describe one message referenced by a pair of message id and event type code. For now we have redundant descriptions that poorly specify the messages, which certainly has to be fixed soon. ---------- Footnotes ---------- (1) though for other reasons, than not to disappoint application programmers who are not schooled in recursive languages (sorry, couldn't resist)  File: ProtoGen.info, Node: The HL7 database, Next: Generating C++ code, Prev: From Text to the Database, Up: Top The HL7 database **************** After being done with the previous work, we are arrived at a point, where we can take some kind of a rest, reflecting about what information we had gained and how it is structured. What we did was to extract sections of the files and to translate them into a different syntactical presentation. However, what we did *not* do up to here is to make any considerations about what is meant by all the tables, and grammars and how they are mutually related. This will be done in this chapter, we try to clarify all these properties of the data model and concepts of transactions. First we take a look at the the relationship of HL7 conceptual entities that explicitly appear in the specification. We will optimize it by removing redundancies, and -- perhaps most important -- we gain an oversight of what HL7 defines. Then we check the database that we compiled in the last step for errors and inconsistencies, and we prove an important assumption we made in the preceding section. Finally we focus on the contents of the relations, we ask what in particular is defined, and how is it defined. * Menu: * Dependencies:: Dependencies * Errors:: Errors * Consistency check:: Consistency check * The data item numbers:: The data item numbers * On the abstractness of abstract syntax in HL7:: On the abstractness of abstract syntax in HL7 * On trigger events:: On trigger events * On the null value:: On the null value  File: ProtoGen.info, Node: Dependencies, Next: Errors, Prev: The HL7 database, Up: The HL7 database Dependencies ============ In the following table, we give a synopsis of all relations, that we have got so far. We tried to somehow compress the names of the items here, to be specific and descriptive as well as short, in order to not loose oversight. Primary keys to the relations are are marked by preceding asterisks (`*'), while other candidate keys are marked with a plus (`+'). If a primary key is made up of more than one attributes, they are set in parentheses with one preceding asterisk. The name of the relation is followed by an `[A]' which means, we have drawn this relation from appendix A or a `[C]' telling that it was drawn from the chapters (including appendix C(1)). Functional Area [A] *FunArId, +Chptr, +FunArDscrptn Message Type [A](2) *MsgId, +MsgDscrptn, FunArId Message [C] *(MsgId, EvntTpCd), +MsgDscrptn, +MsgDef Segment [A] *SgmntId, +SgmntDscrptn, FunArId Segment [C] *SgmntId, +SgmntDscrptn Data Element [A] *DatElNum, +DatElDescrptn, SgmntID, FunArId, MaxLn, DtTypId, Opt, Rep, TblNum Field [C] *(SgmntId, FldNum), MaxLn, DtTypId, Opt, Rep, TblNum, +DatElNum, +FldDscrptn Data Type [A] *DtTypId, +DtTypDscrptn, Ln Table [A] *TblNum, +TblDscrptn, TblClss Table [A] *TblNum, +TblDscrptn Table Value [A] *(TblNum, ValId), ValDscrptn, +ValNum Field Notes [C] *(SgmntId, FldNum), +FldDscrptn, FldNteTxt The rows of the above table are sorted to ease orientation of the reader. Therefore, one thing becomes immediately obvious: There are sometimes more than one relation with the same name. Even though, they both are titled `Segment', they are not the same relation because they don't have the same cardinality.(3) This notwithstanding, it is still obvious that these relations have some domains in common. We can simplify our set of relations by rewriting it such, that any two relations which correspond this way are replaced by a third one which is defined over any domain, which is part of either the first or the second relation. However, different names of two tables do not guarantee that they do not correspond the same way, as was just said. Consider `Data Element' and `Field'. Both are defined over the same domains by different order, except from `FunArId' (i.e. the column titled `owner' in appendix A, and lists the functional area, to where it belongs) and `FldNum'. The question is now, if tuples of both relations can be mapped one to one. We will see below (*note Consistency check::.), that they can. The name `Field' was given in `exseg.awk', since the word `field' is used throughout the HL7 specification to designate the parts of which the segments are built (more than 500 occurences). However `data element' is used sometimes (42 occurences) as well. Why are there two names for the same thing? One answer might be, that `data element' is used where we refer to an atom of data regardless of the context in which it occurs, while `field' is used, for such an atom in the context at a certain place of a certain segment. Thus a data element is the *contents* of a field. In deed, the relation `Data Element' doesn't have a domain, which could designate a certain place in a segment. However, why is there an attribute for repeatability and optionality then? We wouldn't expect an object to be optional per se, whereas a certain field in a segment may well be empty sometimes. Also repeatability is no property of a data element from this point of view, even though it depends on how we think of an repetition: Does the field repeat or does it's contents repeat in the field? If the first was true, then a segment would not necessaryly be of a fixed number of fields,(4) if the second is true, then there must be something in between a field and an atom of data, we could say, that an `occurence' is not identical to a data element. This resembles LISP's point of view: LISP would regard a field of a segment like one half of a pair, which can be a list (i.e. another pair), or an atomic data item. If we have a look at the encoding rules,(5) we notice, that repetition is realized with a special delimiter, this reconfirms us in our view of repetition as happening on a level inbetween a data element and a field, which we might call the level of `occurence'.(6) In order not to digress too much we decide not to consider data element and field as different things, if we can proof the one to one relation of both. We perform a rewrite on both of them, which is similar to the one we made for `Segment' or `Table'. We'll make this proof when we check the consistency of the database, that we acquired. For now assume, that this proof will succeed. Figure 2 shows a sort of entity relationship model of the database before we removed multiple occurences. Each relation of the database is graphed as an entity (a name in a box) which has a relation (a line linking it) to an other entity. Note the different notions of `relation', to avoid confusion, we will speak of a `link' if we mean relationships or dependencies between relations. At each contact between a line and a box, there is a number `1' or `n'. This graph can be "read" by following each line with the words: " is linked to " where is the number, which is written at the box of . Let's have a look if there is more to refine. Were there is a one-to-one link, as between `Table/2' and `Table/3', we can merge the two relations into one, that's what we have already planned to do. However there is more: There is a pair of parallel one-to-many links, one going from `Functional Area' via `Segment/3' and `Field' to `Data Element' and the other going directly from `Functional Area' to `Data Element'. We notice, from the table above, that this parallel link is caused only by the `FunArId' domain. Thus, we can consider removing the domain from the relation at the many-end of the link to remove this indirect redundancy, unless it is not part of a key there, which it isn't. Note that it depends on the one-to-one link, between `Field' and `Data Element' whether we may commit this simplification. If it is a many-to-one link, i.e. if one data element could appear in several fields, we must not do this. Our simplified database looks as sketched in figure 3. The table below will show it in detail: Functional Area *FunArId, +Chptr, +FunArDscrptn Message Type *MsgId, +MsgDscrptn, FunArId Message *(MsgId, EvntTpCd), +MsgDscrptn, +MsgDef Segment *SgmntId, +SgmntDscrptn, FunArId Field *(SgmntId, FldNum), MaxLn, DtTypId, Opt, Rep, TblNum, +DatElNum, +FldDscrptn Data Type *DtTypId, +DtTypDscrptn, Ln Table *TblNum, +TblDscrptn, TblClss Table Value *(TblNum, ValId), ValDscrptn, +ValNum Field Notes *(SgmntId, FldNum), +FldDscrptn, FldNteTxt ---------- Footnotes ---------- (1) see above (*note The HL7 document::.) (2) The redundant appearance of chapter and description of functional area was erased in all tables of appendix A, in which this redundancy appears. (3) This holds unless we do not have two relations with the same cardinality but different domains, we attach our concept of equality of relations to the concept of Prolog. (4) notwithstanding the fact, that trailing missing fields are regarded as `not present' (5) which should be irrelevant if we discuss about conceptual issues. In fact as we'll see below (*note On the abstractness of abstract syntax in HL7::.), there is only a weak distinction between abstract concepts and representation issues in HL7 (6) which is somehow tautologic  File: ProtoGen.info, Node: Errors, Next: Consistency check, Prev: Dependencies, Up: The HL7 database Errors ====== Now we will finally feed our database into Prolog just to see if what we generated was syntactically correct. We shouldn't bother the reader with the results of our typos here, these have been corrected beforehand. Rather, this section reveals the first severe errors in the HL7 specification. Here is what Prolog complains: [WARNING: (/usr/share/doc/HL-7/kap4.msg:5) Syntax error: Operator expected] [WARNING: (/usr/share/doc/HL-7/kap7.msg:11) Syntax error: Operator expected] Further investigations point us directly into the documents: At the definition of the order message we find what is extracted below. Note the matching of brackets and braces. The opening bracket before PID and below ORC are never closed properly. So where do we have to assume the closing brackets to be placed? The author of this document wasn't able to do correct this until he could see Version 2.2 (ballot 1) for the answer. In the table below, corrected brackets are set between asterisks. ORM ORDER MESSAGE Chapter MSH Message Header II [ { NTE} ] Notes and Comments II [ PID Patient Identification III [{NTE}] Notes and Comments II [ PV1 ] *]* Patient Visit III { ORC Common Order IV [ Any Order Segment E.g., ORO, OBR, RX1 IV [ { NTE } ] [ { OBX } Results Segment VII [ { NTE} ]] Notes and Comments II *]* [ BLG ] Billing segment IV } The second error is due to the same kind of grouping problem, however, this time we experience one closing brace too much. It's again hard to guess the correct grouping except from a minor mismatch in the PID group, it can be concluded from version 2.2 that what is set between asterisks is wrong, even though there is at least one mismatch in v2.2 too. ORF Observational Report Chapter MSH Message Header II MSA Message Acknowledgement II { QRD Query Definition V [ QRF ] Query Filter V [ PID ] PATIENT ID III [{NTE*}]*} was: [{NTE]}} { [ ORC ] Order common OBR Observation request VII {[NTE]} Notes and comments II {[OBX] Result VII {[NTE]} Notes and comments II } } * * was: } [DSC] Continuation Pointer V While we cannot be sure who caused this error, whether it was the original editor or some people of the German section of HL7 there is more evidence for the need to edit this document with different methods. E.g. emacs is an editor, which shows paranthesis matching and warns on mismatched paranthesis while they are typed. This is more convenient than WYSIWYG, since while typing a document control of correctness and consistency should take precedence over immediate control of layout(1). ---------- Footnotes ---------- (1) not to repeat here, what was stated about control of layout above (*note The HL7 document::.)