Sometimes we need to sort an XML document by different fields in various order. For instance, here is the XML document:
<?xml version="1.0" encoding="UTF-8"?>
<publications>
<publication>
<author>A. Male</author>
<year>1999</year>
</publication>
<publication>
<author>F. Feng</author>
<author>X. Cao</author>
<year>2011</year>
</publication>
<publication>
<author>J. Allinson</author>
<year>2012</year>
</publication>
<publication>
<author>F. Feng</author>
<author>J. Allinson</author>
<year>1999</year>
</publication>
<publication>
<author>S Lee</author>
<year>2007</year>
</publication>
<publication>
<author>F. Feng</author>
<author>N. Thomas</author>
<year>1999</year>
</publication>
</publications>
We want to sort by year in descending order first, then by first author in ascending order, then by second author (if exists) in ascending order as well. Here is the XSLT (tested in saxon v9):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/publications">
<publications>
<xsl:call-template name="publication"/>
</publications>
</xsl:template>
<xsl:template name="publication">
<publication>
<xsl:for-each select="publication">
<!-- First, sort by pub year -->
<xsl:sort select="year" order="descending"/>
<!-- Second, sort by first author -->
<xsl:sort select="author[1]" order="ascending"/>
<!-- Third, sort by second author (if exists) -->
<xsl:sort select="author[2]" order="ascending"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</publication>
</xsl:template>
</xsl:stylesheet>
Showing posts with label xslt. Show all posts
Showing posts with label xslt. Show all posts
Wednesday, 13 June 2012
Thursday, 10 May 2012
Get lastest year in XSLT
I just come cross a problem, e.g. to get the latest year of a set of publications, the implementation is shown below:
XML code:
<?xml version="1.0" encoding="UTF-8"?>
<publications>
<publication>
<author>Frank</author>
<year>1999</year>
<title>test1</title>
</publication>
<publication>
<author>Frank</author>
<year>2012</year>
<title>test2</title>
</publication>
<publication>
<author>Frank</author>
<year>1980</year>
<title>test3</title>
</publication>
<publication>
<author>Frank</author>
<year>2007</year>
<title>test4</title>
</publication>
</publications>
XSLT code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:template match="/">
<!-- get latest year -->
<xsl:for-each select="/publications/publication/year">
<xsl:variable name="currentYear" select="normalize-space(text())"/>
<xsl:if test="self::node()[count(../..//year[. > $currentYear])=0]">
<latestYear>
<xsl:value-of select="$currentYear"/>
</latestYear>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML code:
<?xml version="1.0" encoding="UTF-8"?>
<publications>
<publication>
<author>Frank</author>
<year>1999</year>
<title>test1</title>
</publication>
<publication>
<author>Frank</author>
<year>2012</year>
<title>test2</title>
</publication>
<publication>
<author>Frank</author>
<year>1980</year>
<title>test3</title>
</publication>
<publication>
<author>Frank</author>
<year>2007</year>
<title>test4</title>
</publication>
</publications>
XSLT code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:template match="/">
<!-- get latest year -->
<xsl:for-each select="/publications/publication/year">
<xsl:variable name="currentYear" select="normalize-space(text())"/>
<xsl:if test="self::node()[count(../..//year[. > $currentYear])=0]">
<latestYear>
<xsl:value-of select="$currentYear"/>
</latestYear>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Thursday, 22 March 2012
Non unicode single quote problem and solution
While doing XSLT transformation, I have experienced a strange problem. The single quote ' stored in the data dictionary has been transformed to non-unicode single quote e.g. ‘ and ’. I still don't know the reason for it as the output xml is in UTF-8. But here is the solution:
<xsl:variable name="singlequote" select='"'"'/>
<xsl:value-of select="translate(translate($ddDoc/IRIS_Data_Dict/instrument/typeOfInstruments//type/@def[../@value=$currentType],'‘',$singlequote),'’',$singlequote)"/>
<xsl:variable name="singlequote" select='"'"'/>
<xsl:value-of select="translate(translate($ddDoc/IRIS_Data_Dict/instrument/typeOfInstruments//type/@def[../@value=$currentType],'‘',$singlequote),'’',$singlequote)"/>
Tuesday, 20 March 2012
Access variables in CDATA from XSLT
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:value-of select ="$AnyVarible"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
Thursday, 8 March 2012
handle whitespace in the XSLT
Below is copied from http://bit.ly/y4zmrF (Thanks, Stack Overflow!)
There are three reasons for getting unwanted whitespace in the result of an XSLT transformation:
- whitespace that comes from between nodes in the source document
- whitespace that comes from within nodes in the source document
- whitespace that comes from the stylesheet
I'm going to talk about all three because it can be hard to tell where whitespace comes from so you might need to use several strategies.
To address the whitespace that is between nodes in your source document, you should use
<xsl:strip-space>
to strip out any whitespace that appears between two nodes, and then use<xsl:preserve-space>
to preserve the significant whitespace that might appear within mixed content. For example, if your source document looks like:<ul>
<li>This is an <strong>important</strong> <em>point</em></li>
</ul>
then you will want to ignore the whitespace between the
<ul>
and the <li>
and between the </li>
and the </ul>
, which is not significant, but preserve the whitespace between the <strong>
and<em>
elements, which is significant (otherwise you'd get "This is an important*point*"). To do this use<xsl:strip-space elements="*" />
<xsl:preserve-space elements="li" />
The
elements
attribute on <xsl:preserve-space>
should basically list all the elements in your document that have mixed content.Aside: using<xsl:strip-space>
also reduces the size of the source tree in memory, and makes your stylesheet more efficient, so it's worth doing even if you don't have whitespace problems of this sort.
To address the whitespace that appears within nodes in your source document, you should use
normalize-space()
. For example, if you have:<dt>
a definition</dt>
and you can be sure that the
<dt>
element won't hold any elements that you want to do something with, then you can do:<xsl:template match="dt">
...
<xsl:value-of select="normalize-space(.)" />
...</xsl:template>
The leading and trailing whitespace will be stripped from the value of the
<dt>
element and you will just get the string "a definition"
.
To address whitespace coming from the stylesheet, which is perhaps the one you're experiencing, is when you have text within a template like this:
<xsl:template match="name">
Name:
<xsl:value-of select="." />
</xsl:template>
XSLT stylesheets are parsed in the same way as the source documents that they process, so the above XSLT is interpreted as a tree that holds an
<xsl:template>
element with a match
attribute whose first child is a text node and whose second child is a <xsl:value-of>
element with a select
attribute. The text node has leading and trailing whitespace (including line breaks); since it's literal text in the stylesheet, it gets literally copied over into the result, with all the leading and trailing whitespace.
But some whitespace in XSLT stylesheets get stripped automatically, namely those between nodes. You don't get a line break in your result because there's a line break between the
<xsl:value-of>
and the close of the <xsl:template>
.
To get only the text you want in the result, use the
<xsl:text>
element like this:<xsl:template match="name">
<xsl:text>Name: </xsl:text>
<xsl:value-of select="." />
</xsl:template>
The XSLT processor will ignore the line breaks and indentation that appear between nodes, and only output the text within the
<xsl:text>
element.
Subscribe to:
Posts (Atom)