Rehdon asked me about giving @xml:id attributes to things, so I whipped up this quick XSLT stylesheet. Some people prefer to use generate-id() to get a truly random and unique ID without semantic baggage. In many cases, where IDs are exposed to the public, I prefer to use some which make sense and are human readable.
Warning: there is a distinct flaw in the lack of testing I’ve done before applying the @xml:id. If something other than a <p> element already has xml:id=”p5″ then it will still add ‘p5′ as an @xml:id to the fifth paragraph. This means that it will produce an xml document that is not well-formed since one of the requirements of @xml:id is that it is unique in the document. Also it would number paragraphs in other namespaces as well. (This may be a bug or a feature depending on your outlook.) It numbers from tei:text so if you don’t have that in your document you should change that variable.
The XSLT stylesheet takes a parameter ‘e’ which you can pass the local-name of the element in question. It assumes ‘p’ otherwise, but you could use it number div, head, w, or really any element just by passing it e=w (or whatever).
Update: Rehdon asked about a configurable optional prefix to the ID and a 4-digit zero-padded number for it. So I changed the script to do that.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:tei="http://www.tei-c.org/ns/1.0"
xmlns="http://www.tei-c.org/ns/1.0"
exclude-result-prefixes="tei"
version="1.0">
<!-- Parameter to pass to the stylesheet, assumes 'p' if nothing given -->
<xsl:param name="e" select="'p'"/>
<!-- If it exists, a prefix string: include a separator, like 'text1_' to get 'text1_p0005' -->
<xsl:param name="pre"/>
<!-- typical copy-all template -->
<xsl:template match="@*|node()|comment()|processing-instruction()" priority="-1">
<xsl:copy><xsl:apply-templates select="@*|node()|comment()|processing-instruction()"/></xsl:copy>
</xsl:template>
<!-- higher priority one to match elements -->
<xsl:template match="*" >
<xsl:copy>
<!-- If the local-name is the element we've passed it, and there is not an @xml:id attribute -->
<xsl:if test="local-name() = $e and not(@xml:id)">
<!-- make a variable numbering current nodes at any level from tei:text -->
<xsl:variable name="num"><xsl:number level="any" from="tei:text" format="1111"/></xsl:variable>
<!-- Then create an @xml:id attribute with the name and the number concatenated -->
<xsl:attribute name="xml:id"><xsl:value-of select="concat($pre, local-name(), $num)"/></xsl:attribute>
</xsl:if>
<!-- apply any other templates (i.e. copy other stuff) -->
<xsl:apply-templates select="@*|node()|comment()|processing-instruction()"/></xsl:copy>
</xsl:template>
</xsl:stylesheet>
Hope that is useful. I’ll try to remember to add it to the TEI wiki as well.
Great job as usual, James, should rightly be posted on the TEI wiki. Of course I already have two enhancement requests ;) :
- customizable prefix, i.e. text1_p300 instead of simply p300
- 0 padding lower numbers, i.e. w00001 instead of w1
Keep up the good work! :)
R.
Updated stylesheet to do this. pass a ‘pre’ parameter of ‘text1_’ to get text1_p0300
-JamesC