XProgramming > XP Magazine > Adventures in C#: Chapter Numbers inside the Chapters
COLLECTED TOPICS: Adventures in C# | Documentation in XP | Book Reviews
Adventures in C#: Chapter Numbers inside the Chapters
Ron Jeffries
02/14/2003
We would like the chapters to have the chapter number in them. The chapter number is in the file name, but we don't know how to get it into the output chapter. We have a scheme in mind. Maybe we can make it work. For sure we are going to learn something.

Contents:

The Idea

OK, we want a tag in our XML that will copy in the chapter number. That sounds easy enough ...

You would think there would be a way in XSLT to read the name of the file being converted in and use it. If we could get it in, we could certainly take it apart and pull out the chapter number. There's enough string manipulation in XSLT to do that readily. But if there's a way to get the input file name inside the XSLT program, I haven't found it.

So here's the plan. In the chapter about making the table of contents, we created an XML file with each chapter described like this:

<chapter>
  <sectionnumber>0</sectionnumber>
  <chapternumber>00.0</chapternumber>
  <title file="jeffries_0_000_Introduction.htm">Introduction</title>
  <precis>The received wisdom in software development is that we should know everything before we start. Most real projects can't meet that criterion. It's perfectly fine to learn as you go. Let's explore ways of doing that.</precis>
</chapter>

There's the chapter number in there, and the section number if we need it. And if we need more information, we can certainly change our indexing program to add it into the XML. How can we get that information into the chapter?

One possibility would be to write a program that opens each chapter and inserts its chapter number into it, using the file name. That would be easy enough to do in C#, though it would have to deal with removing the old chapter number every time things get reordered. Should be no big thing.

But there may be a better way. XSLT has the ability to use one file as a lookup file for another. So the idea is to change the XSLT that generates the individual chapters to use the table of contents XML file to look up the chapter numbers and insert them. Once that is done, the existing process will cause each chapter to be updated properly every time the file names change. The file name contains the chapter number; the index gets the number inserted; the index is used to get the number into the formatted chapters. It sounds straightforward.

The Issues

First of all, we'll need to explain enough XSLT so that you can follow along, but little enough so that you don't fall asleep.

Second, I have not as yet figured out a way to test my XSLT files other than manually. I change them a little, regenerate some articles, and look at them to see if they are OK. That has worked about as well as manual testing always does. It's a little slow and boring, and once in a while I mess up and put a badly-formatted article out on the web site. But usually I survive and the pain hasn't been enough to get me to build tests.

But you're not going to settle for that, are you? You're going to demand that I live by my own rules and test my code. That means I'll have to figure out how to do it. I'm sure I'll be a better person for it, and if I succeed, you'll see more evidence that anything can be tested automatically, and how useful it is.

By the end of this chapter or chapter series -- whichever it turns out to be -- you'll be ready to make your own decision.

A Little About XSLT

You've seen a little about what XML looks like -- basically there are tags, enclosed in squiggles, like <title>...</title>. The tags can be most anything, and their meaning is up to the people defining the XML. In this book, <title> delimits chapter and section titles, depending on the context.

XSLT is a language for describing transformations from XML to other formats, including HTML, XML, and just about anything else textual. It's very complicated and very powerful. The basic idea, as I perceive it, is pretty simple: An XSLT program can be thought of as processing one input file into one output file. XSLT statements match patterns, and when they match, whatever is inside them is put out into the output. Here's an example of a statement, from my XProgramming.com site's XSLT:

<xsl:template match="header">
  <table border="0" width="100%">
    <tr>
      <td align="center">
        <xsl:apply-templates select="title"/><br/>
        <xsl:apply-templates select="author"/>
        <xsl:apply-templates select="date"/>
      </td>
    </tr>
  </table>
  <xsl:call-template name="print-precis"/>
</xsl:template>

Let's explore what that means. The "xsl:template" means that this is a pattern matching template. When it matches, whatever is inside will be executed and any output copied to the output. The "match=" attribute says that this template will match whenever a tag comes along that looks like <header>...</header>. This template matches that whole tag, and everything inside it, and processes it.

Much of the stuff inside this template is pure HTML. The <table> tag is HTML, intended to be sent to the output, as are the <tr> and <td>, and all the attributes like border and width and align, inside them. As we'll see in a moment, XSLT would try to process those tags, except for a trick. XSLT tries to process everything.

In particular, it will process the xsl:apply-templates tags. Consider this one:

<xsl:apply-templates select="title"/><br/>

That tells XSLT that in processing this "header" tag, it should first process any "title" tags inside, then any "author", then any "date" tags. What this will do is ensure that even if the title, author, and date are in some other order inside the header tags, they will be processed in that order. Let's look at the one for title:

<xsl:template match="header/title">
  <span class="title">
    <xsl:apply-templates/>
  </span>
</xsl:template>

Using the same reasoning, that says "If we encounter a title tag inside a header tag, do what's in here. What's in here are the "span" tag, an HTML tag that applies a formatting style, and the inner "apply-templates". So this pattern means: "When matching header/title, put out the beginning of a span, then apply all applicable templates, then put out the end of the span." What are "all applicable templates? Well, in principle, they could be anything. Almost every template in XSLT will include another apply-templates statement, unless we want it not to process whatever is inside the tag it matches. And that brings us to why things like "span" and "table" get processes. Most XSLT programs include something like this template:

<xsl:template match="*|@*|comment()|processing-instruction()|text()">
  <xsl:copy>
    <xsl:apply-templates select="*|@*|comment()|processing-instruction()|text()"/>
  </xsl:copy>
</xsl:template>

What that cryptic mess means, basically, is "just copy the rest". So anything that isn't understood, any tags or any text, are just copied to the output. And that's why when we see the templates above, and the related ones, that this:

  <header index="yes">
    <topic name="acs"/>
    <date updated="no">20030214.2</date>
    <title>Adventures in C#: Chapter Numbers inside the Chapters</title>
    <author>Ron Jeffries</author>
    <precis>We would like the chapters to have the chapter number in them. The chapter number is in the file name, but we don't know how to get it into the output chapter. We have a scheme in mind. Maybe we can make it work. For sure we are going to learn something.</precis>
  </header>

gets converted to this:

  <table width="100%" border="0">
    <tr>
      <td align="center"><span class="title">Adventures in C#: Chapter Numbers inside the Chapters</span>
        <br>
        <span class="author">Ron Jeffries</span>
        <br>
        <span class="date">02/14/2003</span>
      </td>
    </tr>
  </table>

That's enough about the XSLT language for right now. We'll look at some more details as we need them. XSLT is very powerful, rather nifty in its own way, but very odd. There are a couple of books in the Bibliography that can help you learn more if you need to.

We've already seen a little about how to use XSLT. Basically you give an input string or file to an XSLT processor, along with the XSLT file to use in the translation, and out comes the translated file. In our case, that's a web page or a book chapter in HTML.

Let's Get to Work

We want to be able to put the chapter number into the chapter. The simplest way to do that, I'm thinking, will be to have a new XML tag that, when it appears, causes XSLT to look up the chapter number in the table of contents and put it into the output. There is a way to do that, and believe me, it is rather odd. I found an example in one of my XSLT books (XSLT Quickly, ISBM 1-930110-11-1) and I'm experimenting with it. I haven't found a really good way to experiment, so if what we're about to do seem odd, well, it is. We'll try to figure out a better way as we go on.

Remember that if we put an XSL processing line in an XML file, and if the XSL file produces HTML, we can view the result in Internet Explorer. Now, the truth is, even if the output is a plain XML file, we can view it in IE, and that may give us a bit of help in coming up with a better way than what I've got right now.

I started with a very simple XML file, in the format of an article. It looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="magazine.xsl"?>
<page>
  
  <header index="no">
    <topic name="xxx"/>
    <date updated="no">20020923</date>
    <title>Dummy</title>
    <author>Ron Jeffries</author>
    <precis>Just Testing</precis>
  </header>
  <contents/>
  
  <lookup colorCode="c1"/>
  <lookup colorCode="c2"/>
</page>

The new stuff is the <lookup colorCode="c1"/>, and it is copied almost directly out of the example in XSLT Quickly. The reason for that is that I tried to translate the example to do the chapter number as I went, and it didn't go very well. I couldn't figure out what I was doing wrong, so I decided to copy the example as exactly as I could.

To process the XSLT, I decided to use my existing code for producing articles. So I'm editing the XSLT that produces this article, as I put in this feature. That is a very bad idea, and it has bitten me a number of times. At this moment, I'm not sure how to set up a simpler XSLT file, and I'm too stupid to stop and figure it out. So I'll continue to try to lift myself by my own belt for just a little longer.

There are two other parts to working on this problem. There's the new XSLT that I added to my file, and it looks like this:

<xsl:variable name="colorLookupDoc" select="document('aalookup.xml')"/>
<xsl:key name="colorNumKey" match="color" use="@cid"/>

<xsl:template match="colors">
  COLORS <xsl:apply-templates/> ENDCOLORS
</xsl:template>

<xsl:template match="color">
  color(<xsl:value-of select="@cid"/>) <xsl:apply-templates/> endcolor
</xsl:template>

<xsl:template match="lookup">
  <xsl:apply-templates select="$colorLookupDoc"/>
  <xsl:variable name="shirtColor" select="@colorCode"/>
  code(<xsl:value-of select="$shirtColor"/>)
  <xsl:for-each select="$colorLookupDoc">
    each(
    <xsl:value-of select="key('colorNumKey', $shirtColor)"/>
    )endeach
  </xsl:for-each>
</xsl:template>

This is more complex than it will turn out to be (I hope), and it got that way with a bunch of experimentation that went on last night. I'll spare you that. Believe me, you'll thank me when this is all over. Here's what we have so far, from the top.

The xsl:variable name statement at the top reads the entire document "aalookup.xml" into the processor, into a variable named colorLookupDoc. (XSLT processors always read everything in and then process. That's what allows them to process things in any order you want.) The lookup file looks like this:

<?xml version="1.0"?> 
<colors>
  <color cid="c1">yellow</color>
  <color cid="c2">black</color>
</colors>

The idea of the file, of course, is that the name of the color whose color id (cid) is "c1" is "yellow". I can sort of see that I can slowly convert that to the chapter setup in our table of contents XML file, or modify the table of contents file, as needed.

The xsl:key statement says that there will be a key named colorNumKey, which will match color tags, using the attribute cid. "@" stands for "attribute". Cute, huh? Those two lines are the core of the lookup idea. They read the lookup table into a variable, and define the key to use in looking it up.

The next two match statements, on "colors" and "color" are there for debugging. As I worked to put this together, I wasn't sure what was being processed, and when. So those two matches put some output into the HTML, to give us a clue to what is going on.

I find myself doing this a lot when I work with XSLT. XSLT patterns match inside other patterns, and there are modes and specialized matches to create things like lists of headings, or to make things format differently in the table of contents than they do in the article. It gets confusing, so often I just put some text in to display what's going on. We'll see in a moment what happens.

There's a lot more XSLT in the magazine file I'm using, but we don't need to look at it right now. Let's open the page in IE and see what comes out. I won't show you the part that looks like an article. Here's what's important:

COLORS color(c1) yellow endcolor color(c2) black endcolor ENDCOLORS code(c1) each( yellow )endeach COLORS color(c1) yellow endcolor color(c2) black endcolor ENDCOLORS code(c2) each( black )endeach

Let's see what we can learn from that trace. There are two lookup tags in the input, one looking up c1, and one looking up c2. Here's the code again:

<xsl:template match="lookup">
  <xsl:apply-templates select="$colorLookupDoc"/>
  <xsl:variable name="shirtColor" select="@colorCode"/>
  code(<xsl:value-of select="$shirtColor"/>)
  <xsl:for-each select="$colorLookupDoc">
    each(
    <xsl:value-of select="key('colorNumKey', $shirtColor)"/>
    )endeach
  </xsl:for-each>
</xsl:template>

When that code runs, it will:

  1. Apply the templates for the color lookup document. We'll come back to that.
  2. Set the variable shirtColor to the value of the attribute @colorCode in the lookup statement. You'll see why we have to do that in a moment.
  3. Display "code(, the shirtColor, and ")" for debugging
  4. Do an xsl:for-each statement, selecting the colorLookupDocument.
  5. Display "each" each time through the for-each loop.
  6. Output the value of the key lookup.
  7. Display "endeach" and end the for-each.

Checking the output, we see what happens. Each time the lookup runs, the apply-templates for $colorLookupDoc is done, and the two templates for colors and color are running. That's what all that stuff from COLORS to ENDCOLORS is. That tells us that the templates are being applied in each lookup. That could be inefficient if we need the chapter number more than once, but we won't worry about it now. After that is done, we check the code to see if we have found it correctly in the attribute. The output says c1 in the first case and c2 in the second, so that is working.

The thing that was confusing me was the "for-each". I expected that to loop over the items in the colorLookupDocument, one for each color. As we see in the output, I was wrong: the loop only runs once in each case. Then the value-of statement looks up the key named 'colorNumKay', using the variable $shirtColor, and it's getting the right string.

So what is going on with the for-each? There's no loop there. Well, it turns out, as I read between the lines of my XSLT books, that what that for-each statement is really doing is setting the context of the statement inside the loop. $colorLookupDoc is just one structure in XML, a colors tag that happens to contain a couple of color tags. This sets the value-of select with the key to be looking at the lookup table, looking up the local variable $shirtKey.

The local variable is necessary, even though it just means "@colorCode", the attribute of the lookup tag that we're processing. The reason is that inside the new context set by the for-each, there is no access to @colorCode any more, so we have to use a temporary name.

So most of this rigmarole is about changing the context for the value-of statement to be looking at the lookup table instead of the current document. Yucch. However, it's working. Let's trim out the cruft and get relatively clean XSLT.

<xsl:variable name="colorLookupDoc" select="document('aalookup.xml')"/>
<xsl:key name="colorNumKey" match="color" use="@cid"/>

<xsl:template match="colors">
</xsl:template>

<xsl:template match="color">
</xsl:template>

<xsl:template match="lookup">
  <xsl:apply-templates select="$colorLookupDoc"/>
  <xsl:variable name="shirtColor" select="@colorCode"/>
  <xsl:for-each select="$colorLookupDoc">
    <xsl:value-of select="key('colorNumKey', $shirtColor)"/>
  </xsl:for-each>
</xsl:template>

Sure enough, the output is now just "yellowblack", which is what we'd expect. Note that we needed to include empty template matches for colors and color. Otherwise their contents would have displayed in the course of processing. That would be bad.

Now What?

All the above is by way of experiment. It was a little risky, using my magazine XSL file in the experiment, but I got away with it. (I only had to recover an old copy of the file once.) Now we have learned how to write XSL code that will look up shirt colors in a color file, and just need to translate that into looking up chapter numbers in a table of contents file. Should be a piece of cake. But there are some decisions to make. It's very probable that we can just slowly modify the current XSLT code to be what we need. rename colors to page, color to chapter, modify the key definition, change the file name, change the lookup tag name to chapternumber, and so on. At the end of all this, it should be working.

But if we do that, I'll be the only one (well, and you, if you read this) who knows how it happens, and how it works. And I'm very likely to forget, and you won't be around when I need you. This should really be done with a series of tests, to document what has been learned here. You're probably not writing a book to document what you're programming, and probably you don't want to. But you could write tests, and probably need to. In the next chapter, we'll take a look at how to do that in a relatively painless way. At least I hope it's painless.

XProgramming > XP Magazine > Adventures in C#: Chapter Numbers inside the Chapters
COLLECTED TOPICS: Adventures in C# | Documentation in XP | Book Reviews