Friday, July 1, 2011

Accessing Positional XML Elements in XSLT

One of the most frustrating things about XHTML and some other document XML formats is that they don't deal with document structure very well.  An H1 tag simply creates a level 1 heading, and so on for H2 through H6.  These tags only "introduce" a new section, they don't really create the appropriate section structure in the XML document.  This creates nightmare for organizations trying to manage structured documentation because it is rather difficult to deal with section structures declaratively in languages like XSLT.  Even the Apple PLIST format is frustrating (prompting this tweet) because the key and value (or dictionary) are arranged positionally rather than through containment.

The same sort of problem shows up when processing HL7 Version 2 messages (say to convert them to HL7 CDA) when manipulating often proprietary XML translations of HL7 Version 2 supported by different interface engines.  HL7 does have a standard Version 2 XML format (see Version 2.x Schemas on this page), but it is not widely supported in products.  So, if you want to process a Version 2 ORU message , you will often find XML that contains one or more OBR tags followed by several OBX tags without proper containment.

The general structure of the ORU includes the following definition:

{ [ORC] Order common
   OBR Observations Report ID
  { [NTE] } Notes and comments
  { [OBX] Observation/Result
    {[NTE]} Notes and comments
  }
}
But most commonly when this is translated into XML, instead of the OBX being contained within the preceding OBR as would be expected, it follows it. So you wind up with this:

OBR
OBX
OBX

Instead of this:

OBR
  OBX
  OBX

Just like in XHTML where you wind up with this:

H1
P
P
H2
P
P

Instead of:
H1
 P
 P
 H2
  P
  P

To process either of these in XSLT can be very challenging, because often you want to be able to relate the processing of each OBX (or P) to its OBR (or H#) in the hierarchy.

So how do you process this sort of XML using XSLT?  And how can you make the processing efficient?

There are several different tricks you can use:
The first trick is a little dicey sometimes but can be pretty efficient:  Use a two-pass transform where the first pass creates the appropriate structure and the second pass can do the real work.  It makes use of the xsl:text element with the disable-output-escaping attribute set to yes.  The basic details of this trick are:

  1. For the first heading, you generate some sort of section XML tag, inside the xsl:text element
  2. For every heading thereafter, you close the previous section tag, and open a new one using the same mechanims.
  3. At the end of processing, you close the last open section tag. 
This works just fine for the OBR/OBX example, because there is one level of nesting.  It doesn't work very well for the H1/H2/P example because of multiple nesting levels.  Two pass processing is OK, but I like to keep it inside one XSLT.  There is a way to do that using the EXSLT node-set() function:

  1. Create a variable containing the XML generated during the first pass.
  2. Convert it to a node-set using the EXSLT node-set() function.
  3. Apply templates to a selection from that node-set.

The skeleton below shows how you would do this:

‹xsl:variable name='pass1xml'› .. Generate the first pass XML ... ‹/xsl:variable›
‹xsl:apply-templates select='exslt:node-set($pass1xml)'/›

This will work with SOME XSLT processors, but not others.  The way that disable-output-escaping often works is by inserting an XML processing instruction in the XML output that will write the appropriate text when it is output to a file or stream.  This processing instruction is understood internally by the XSLT processor as a special processing instruction, and the XML output is not usually reparsed. So, normally you need a two-stage pipeline instead of being able to handle it all in one stage.  [Note: This technique isstill very useful for creating a two stage processing pipeline inside one XSLT when the disable-output-escaping feature is not used, and almost all XSLT processors support the node-set() function.]

Another way to deal with this problem is by using the sibling axes in XSLT.  This is more complex, and requires a good deal more explanation of how it works.  I'll save that discussion for next week.

4 comments:

  1. Hi Keith,

    Just wanted to mention how much I have enjoyed your blog, as well as your blogroll; in particular, coming from an XPath background, I appreciate articles like this immensely because they show how general XPath and XSLT understanding comes into play when working with HL7 and its kin.

    One thing worth mentioning is that for anyone who wants a better understanding of XPath, the XSL-List at MulberryTech is an amazing place to discover both the arcane and the mundane, with regular appearances from Dr. Kay and various members of the EXSLT crowd (email xsl-list-digest-subscribe@lists.mulberrytech.com).

    ReplyDelete
  2. Piers, thanks for the feedback. I'd have to agree that mulberrytech provides some great XML resources. I've known Debbie an Tommy for years from back in my (pure) XML days.

    ReplyDelete
  3. If I had a dollar for every time I have printed out MulberryTech's XLST/XPath cheatsheet (doublesided, of course), and pinned it to the cubicle wall of a needy coworker...

    ReplyDelete
  4. Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.

    ReplyDelete