Using XML with Cincom's VisualWorks Smalltalk 5i.3

by Thomas Gagné


In recent versions of VisualWorks Smalltalk (at least as early as 5i.2) Cincom included class libraries for working with XML and XSL. Unfortunately, no documentation or class comments accompanied the libraries. In this short paper I'll share everything I've figured out about how to use them by way of describing how I'm using them. There may be better ways, but this is the best I've come up with. And what the heck, it's working so I'm not going to complain (too much).

The XML parcel is loaded automatically when you unwrap VWNC, because VW uses it as its default method of filing-in and out Smalltalk code. I don't know why they bothered because the code is unreadable inside XML tags and attempting to read it provides insight into niether Smalltalk or XML.

Before attempting to do anything with it, there's a patch file you need to file-in (courtesy, Randy Ynchausti). He knows more about XML than I do, and it's impossible to add attribute nodes without it. As of this writing, I don't know if these patches will be included in the final release of 5i.3, but I think they ought to (so does Randy).

Building a DOM from XML

If you're like me, the first thing you may be trying to do is build a Document Object Model (DOM) tree from some kind of XML input. Assuming you've got the XML in a String the following code will build an XML Document:

XML.XMLParser processDocumentString: theXMLString 
	beforeScanDo: [ :p | p validate: false].

Though the code above appears as though it should be easy to use, there's some hidden features you should know about. First, theXMLString can not contain any null bytes. Depending on where your XML comes from it may have a NULL byte at the end (like mine did). Many languages implement strings as an array of bytes (usually printable ones) ending with a null (a character with integer value 0). In my case, the XML was coming from a remote client written in C using middleware to send the message to my server. Since the middleware doesn't assume to know anything about the message it received, it's received into a ByteString, null-byte and all. To remove it I used:

XML.XMLParser processDocumentString: (aByteString copyWithout: 0) asString
    beforeScanDo: [ :p | p validate: false].

Starting out, I didn't know much about the value of DTDs either (Document Type Definitions), so I wasn't using them (more on why you should later). What you need to know is XML comes in two flavors, (three if you include broken as a flavor) well-formed and valid.

Well-formed XML is simply XML following the basic rules, like only one top-level (the document's root), no overlapping tags, and a few other contraints. Valid XML means not only is the XML well-formed, but it's also compliant with some kind of rule base about which elements are allowed to follow which other ones, whether or not attributes are permitted and what their values and defaults should be, etc.

There's no way to get around well-formedness. Most XML tools complain vociferously about missing or open tags. What you may not have lying around, though, is a DTD describing how the XML should be assembled. If you need to skip validation for any reason you must include the selector:

beforeScanDo: [ :p | p validate: false].

Now that you have your XML document, you probably want to access its contents (why else would you want one, right?). Let's take the following (brief) XML as an example:

<porder porder_num="10351">
  <porder_head>
    <order_date>01/04/2000</order_date>
  </porder_head>
  <porder_line>
    <part>widget</part>
    <quantity>1.0000</quantity>
  </porder_line>
  <porder_line>
    <part>doodad</part>
    <quantity>2.0000</quantity>
  </porder_line>
</porder>

The first thing you probably want to know is how to access the different tags, and more specifically, how to access the contents of those tags. First, by way of providing a roadmap to the elements I'll show you the Smalltalk code for getting different pieces of the document, assuming the variable you've assigned the document to is named doc. I'll also create instance variables for the various elements as I go along:

Element you want Code to get it
porder element doc root
porder_head porderhead := doc root elementNamed: 'porder_head'
order_date (as a String) orderDate := (porderHead elementNamed: 'order_date') characterData
order_date (as a Date) orderDate := (Date readFrom: (porderHead elementNamed: 'order_date') characterData readStream)
a collection with both porder_lines porderLines := doc root elementsNamed: 'porder_line'

I've deliberately left-out accessing porder's attribute because accessing them is different from accessing other nodes. You can get an OrderedCollection of attributes using:

attributes := doc root attributes.

but the ordered collection isn't really useful. To access any single attribute you'd need to look for it in the collection:

porderNum := (attributes detect: [ :each | each key type = 'porder_num' ]) value.

But that's not a whole lot of fun, especially if there's a lot you need to get, and if there's any possibility the attribute may not exist. Then you have to do the whole detect:ifNone: thing, and boy, does that make the code readable! What I did instead was create a method in my objects' abstract:

dictionaryForAttributes: aCollection
	^Dictionary withAll: (aCollection collect: [ :each | Association key: each key type value: each value ])

Now what you have is an incrementally more useful method for getting attributes:

attributes := self dictionaryForAttributes: doc root attributes.
porderNum := attributes at: 'porder_num'.

At first this appears like more code, and for a single attribute it probably is. But if an element includes more than one attribute the payoff is fairly decent. Of course, you still need to handle the absence of an attribute in the dictionary but I think it reads a little better using a Dictionary than an OrderedCollection:

porderNum := attributes at: 'porder_num' ifAbsent: [].
I'm not as bold as some, to add my own methods to the XML classes, but one I *really* wish was present was the ability to check if a child element existed. XML.Element>>elementNamed: throws a conniption (actually, an exception) if the element doesn't exist. If the DTD allows for an element to appear zero or more times it remains necessary for Smalltalk code to test for it's presence before attempting to access it. Though code could be written:
(elements := aNode elementsNamed: 'someName') isEmpty not
    ifTrue: [ someVariable := elements first characterData ].
it would be preferable to have a simpler test less dependent on long Smalltalk expressions:
(aNode includesElement: 'someName') ifTrue: [..].
It might be useful to have an #elementNamed:ifAbsent: method but I'm unsure what I would want to put into the ifAbsent: block since I'd like to continue processing with as few tests as possible. The most likely test of an ifAbsent: message is for nil, which means I'm still testing. Maybe the better approach would be to use:
(aNode elementsNamed: 'someName') do: [ :each | self doSomethingWith: each ].
Which works for zero-or-one (someName?) and zero-or-many (someName*) DTD contraint specifications.

Building XML

There's little reason to build an XML document if its not going to be processed by something down the road. Most XML tools require XML documents have a document root. A root is a tag inside which all other tags exist, or put another way, a single parent node from which all other nodes descend. In my case, a co-worker was attempting to use Sablot's sabcmd to transform the XML from my server into HTML. So start your document with the root ready to go:

replyDoc := XML.Document new.
replyDoc addNode: (XML.Element tag: 'response').

Before doing anything more complex, we can play with our new XML document. Assuming you're going to want to send the XML text to someone or write it to a file, you may first want to capture it in a string. Even if you don't want to first capture it into a string our example is going to:

replyStream := String new writeStream.
replyDoc printOn: replyStream.

If we examine'd the contents of our replyStream (replyStream contents) we'd see:

<response/>

Which is what an empty tag looks like. But remember how when we got the string our parser complained about the null byte? Well, you're sending it back to that same application so you might want to include one. If the system you're sending it back to is UNIX based, you may also want to replace carriage returns with line feeds:

replyStream := String new writeStream.
replyDoc printOn: replyStream.
replyStream cr; nextPut: (Character value: 0).
replyString := replyStream contents copyReplaceAll: (Array with: Character cr) with: (Array with: Character lf).

Let's add some text to our XML document now. Let's say we want it to look like:

<response>Hello, world!</response>

Building this actually requires two nodes be added to a new XML document. The first node (or element) is named response. The second node adds text to the first:

replyDoc := XML.Document new.
replyDoc addNode: (XML.Element tag: response). "our root node"
replyDoc root addNode: (XML.Text text: 'Hello, world!').

Another way of writing it, and the way I've adopted in my code is to create the whole node before adding it. This is not just to reduce the appearance of assignments, but it suggests a template for cascading #addNode: messages to an element, which, if you're building any kind of nontrivial XML, you'll be doing a lot of:

replyDoc := XML.Document new.
replyDoc addNode: (
    (XML.Element tag: response)
        addNode: (XML.Text text: 'Hello, world!')
).

Unless you're absolutely sure you'll never accidentally add text nodes that have an ampersand (&) in them, you'll need to escape it to get past XML parsers. The way I got around this was to escape them whenever I added text nodes. To make it easier, I (again) created a method in my objects' abstract superclass:

asXMLElement: tag value: aValue
    | n |

    n := XML.Element tag: tag.
    aValue isNil ifFalse: [n addNode: (XML.Text text: (aValue asString copyReplaceAll:'&' with: '&amp;'))].
    ^n

&amp is an XML entity representing an (escaped) ampersand. XML parsers don't like seeing them unless they're introducing an entity, which mine were not. Admittedly, this will cause problems if you need entities in your element texts. When that's the case, simply don't use the method above, or create a method (named accordingly) that skips the &amp; substitution.

Calls to self asXMLElement: 'sometagname' value: anInstanceVariable are littered throughout my code. It works well for most things except dates. Date types don't answer the message asString, and even if they did, the default date format is too verbose for my application. My XML client needs dates expressed in the traditional (US) format of mm/dd/yyyy. As you might expect, I created another of those convenience methods in the abstract superclass:

dateString: aDate
	^aDate isNil ifFalse: [ aDate printFormat: #(2 1 3 $/ 1 1) ] ifTrue: ['']

I forget what all those formatting characters mean. All I know is the end result is what I was looking for.

Adding attributes to documents is, thankfully, easier than accessing them thanks to the patch file mentioned earlier. If we wanted to add an attribute to our document above we can do so with a single statement:

replyDoc root addAttribute: (XML.Attribute name: 'isExample' value: 'yes').

Now, our XML looks like:

<response isExample="yes">Hello, world!</response>

Using DTDs

What I didn't appreciate in my first XML project (this one) was how much error checking I was doing just to verify the format of incoming XML. During testing I'd go looking for attributes or elements that should have been there but for various reasons were not. Because I was coding fast and furious I overlooked some and ignored others. Testing quickly ferreted out my carelessnes and my application started throwing exceptions faster than election officials throw chads.

The cure, at least for formatting, is having a DTD, or Document Type Definition describing the XML format. You can read more about the syntax of DTDs at websites like www.w3schools.com.

There's not a lot programmers are able to do with DTDs in VisualWorks, except requiring incoming XML to include DOCTYPE statements. There is something programmers need to do to handle the exceptions the XML parser throws when it finds errors.

I'm not an expert at writing Smalltalk exception handling code, and I haven't decided on what those exceptions should look like to the client who sent the poorly formatted XML in the first place. The code below does a decent job of catching the errors and putting the description of the error into an XML response. It's also a fairly decent example of XML document building as discussed earlier.

replyDoc := XML.Document new.
replyDoc addNode: (XML.Element tag: 'response').

[
    doc := XML.XMLParser processDocumentString: (anIsdMessage message copyWithout: 0) asString
] on: Exception do: [ :ex |
    replyDoc root 
        addAttribute: (XML.Attribute name: 'type' value: 'Exception');
        addNode: ((XML.Element tag: 'description')
            addNode: (XML.Text text: ex signal description));
        addNode: ((XML.Element tag: 'message')
            addNode: (XML.Text text: ex messageText));
        yourself
].

I know the yourself isn't necessary at the end of the cascade, but I'm in the habit of adding them at the end of all my cascades. I remember having a really good reason for this a few months ago but I've forgotten what it was.

I said before there's not a lot programmers can do with DTDs, but there are some things I wish VW's XML library would do:

Gemstone/S and VW's XML

Because learning XML just isn't fun enough, I decided to port my application from Sybase to both Gemstone/S and Objectivity/DB (another paper, maybe). While porting the code to Gemstone/S (sans XML) I was reminded how you can't access classes on Gemstone/S (like the XML classes) that don't exist in the Gemstone class hierarchy. Brokat will shortly be introducing a port of VW's XML library to work inside the Gemstone/S server. This is a welcome addition. Personally, I have some apprehensions about this that have more to do with my lack of Smalltalk experience than with the libraries.

XSL Processing

I spent a week the other night trying to figure out how to get VW's XSL libraries to do anything. I no longer need it now, but I did discover some things others with an immediate need may want to be aware of.

Attributions

Cincom, for supporting Smalltalk and the Smalltalk community by making a non-commercial version available. I'm especially pleased with the Linux version. I understand Peter Hatch had a lot to do with that.

Thanks also to Randy Ynchausti, Bijan Parsia, Reinout Heeck, and Joseph Bacanskas for answering many questions on VW XML.


Mr. Gagné has been programming in various languages and operating systems since 1983, is currently employed by eFinNet, Corp., and the maintaner of the open-source middleware project isect.