Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Xml XSLT中的词频计数器_Xml_Xslt_Xslt 2.0 - Fatal编程技术网

Xml XSLT中的词频计数器

Xml XSLT中的词频计数器,xml,xslt,xslt-2.0,Xml,Xslt,Xslt 2.0,我正在尝试用XSLT制作一个词频计数器。我希望它使用停止词。我开始学英语。但是我很难让停止语起作用 这段代码适用于任何源XML文件 <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/

我正在尝试用XSLT制作一个词频计数器。我希望它使用停止词。我开始学英语。但是我很难让停止语起作用

这段代码适用于任何源XML文件

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
   version="2.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">   
    <xsl:variable name="stopwords" select="'a about an are as at be by for from how I in is it of on or that the this to was what when where who will with'"/>
     <wordcount>
        <xsl:for-each-group group-by="." select="
            for $w in //text()/tokenize(., '\W+')[not(.=$stopwords)] return $w">
            <word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
        </xsl:for-each-group>
     </wordcount>
</xsl:template>

</xsl:stylesheet>

我认为
不是(.=$stopwords)
是我的问题所在。但我不知道该怎么办


此外,我还将给出如何从外部文件加载停止字的提示。

您正在将当前字与所有停止字的整个列表进行比较,而应该检查当前字是否包含在停止字列表中:

not(contains(concat($stopwords,' '),concat(.,' '))

需要连接空格以避免部分匹配-例如,防止“abo”与“about”匹配

您的$stopwords变量现在是单个字符串;您希望它是一个字符串序列。您可以通过以下任一方式执行此操作:

  • 将其声明更改为

    <xsl:variable name="stopwords" 
      select="('a', 'about', 'an', 'are', 'as', 'at', 
               'be', 'by', 'for', 'from', 'how', 
               'I', 'in', 'is', 'it', 
               'of', 'on', 'or', 
               'that', 'the', 'this', 'to', 
               'was', 'what', 'when', 'where', 
               'who', 'will', 'with')"/>
    
    <xsl:variable name="stopwords" 
      select="tokenize('a about an are as at 
                        be by for from how I in is it 
                        of on or that the this to was 
                        what when where who will with',
                        '\s+')"/>
    
    
    
  • 将其声明更改为

    <xsl:variable name="stopwords" 
      select="('a', 'about', 'an', 'are', 'as', 'at', 
               'be', 'by', 'for', 'from', 'how', 
               'I', 'in', 'is', 'it', 
               'of', 'on', 'or', 
               'that', 'the', 'this', 'to', 
               'was', 'what', 'when', 'where', 
               'who', 'will', 'with')"/>
    
    <xsl:variable name="stopwords" 
      select="tokenize('a about an are as at 
                        be by for from how I in is it 
                        of on or that the this to was 
                        what when where who will with',
                        '\s+')"/>
    
    
    
  • 从以下形式的名为(例如)stoplist.XML的外部XML文档中读取

    <stop-list>
      <p>This is a sample stop list [further description ...]</p>
      <w>a</w>
      <w>about</w>
      ...
    </stop-list>
    
    
    这是一个示例停止列表[进一步说明…]

    A. 关于 ...
    然后将其加载,例如

    <xsl:variable name="stopwords"
      select="document('stopwords.xml')//w/string()"/>
    
    
    

谢谢。这是对停止语的标记。我会尝试将stopwords.xml想法发送给您。谢谢。我在以前的版本中标记了列表…我应该抓住它。我也会试试你的stopwords.xml想法。