Xml XSLT中的词频计数器
我正在尝试用XSLT制作一个词频计数器。我希望它使用停止词。我开始学英语。但是我很难让停止语起作用 这段代码适用于任何源XML文件Xml XSLT中的词频计数器,xml,xslt,xslt-2.0,Xml,Xslt,Xslt 2.0,我正在尝试用XSLT制作一个词频计数器。我希望它使用停止词。我开始学英语。但是我很难让停止语起作用 这段代码适用于任何源XML文件 <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="stopwords" select="'a about an are as at be by for from how I in is it of on or that the this to was what when where who will with'"/>
<wordcount>
<xsl:for-each-group group-by="." select="
for $w in //text()/tokenize(., '\W+')[not(.=$stopwords)] return $w">
<word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
</xsl:for-each-group>
</wordcount>
</xsl:template>
</xsl:stylesheet>
我认为不是(.=$stopwords)
是我的问题所在。但我不知道该怎么办
此外,我还将给出如何从外部文件加载停止字的提示。您正在将当前字与所有停止字的整个列表进行比较,而应该检查当前字是否包含在停止字列表中:
not(contains(concat($stopwords,' '),concat(.,' '))
需要连接空格以避免部分匹配-例如,防止“abo”与“about”匹配 您的$stopwords变量现在是单个字符串;您希望它是一个字符串序列。您可以通过以下任一方式执行此操作:
- 将其声明更改为
<xsl:variable name="stopwords" select="('a', 'about', 'an', 'are', 'as', 'at', 'be', 'by', 'for', 'from', 'how', 'I', 'in', 'is', 'it', 'of', 'on', 'or', 'that', 'the', 'this', 'to', 'was', 'what', 'when', 'where', 'who', 'will', 'with')"/>
<xsl:variable name="stopwords" select="tokenize('a about an are as at be by for from how I in is it of on or that the this to was what when where who will with', '\s+')"/>
- 将其声明更改为
<xsl:variable name="stopwords" select="('a', 'about', 'an', 'are', 'as', 'at', 'be', 'by', 'for', 'from', 'how', 'I', 'in', 'is', 'it', 'of', 'on', 'or', 'that', 'the', 'this', 'to', 'was', 'what', 'when', 'where', 'who', 'will', 'with')"/>
<xsl:variable name="stopwords" select="tokenize('a about an are as at be by for from how I in is it of on or that the this to was what when where who will with', '\s+')"/>
- 从以下形式的名为(例如)stoplist.XML的外部XML文档中读取
<stop-list> <p>This is a sample stop list [further description ...]</p> <w>a</w> <w>about</w> ... </stop-list>
然后将其加载,例如这是一个示例停止列表[进一步说明…] A. 关于 ...
<xsl:variable name="stopwords" select="document('stopwords.xml')//w/string()"/>