Xml XSLT统计特定子元素的共现次数_Xml_Xslt

Xml XSLT统计特定子元素的共现次数

xml xslt

Xml XSLT统计特定子元素的共现次数,xml,xslt,Xml,Xslt,我试图统计xml文档中事件记录中特定人员的共同出现次数。我的源文档由事件元素组成，事件元素包含p元素中的散文和bibl元素中的书目记录，两者都包含对人的引用。我希望能够计算出在整个文档中，两个人一起出现在事件中的频率。我一直在使用XSLT2.0，但可以切换到3.0 例如，南希·德鲁（Nancy Drew）和迪克·特蕾西（Dick Tracy）一起参加以下活动的次数，我如何才能得到答案3？还是给迪克·特蕾西和萨姆·斯派德一个 <listEvent> <event

我试图统计xml文档中事件记录中特定人员的共同出现次数。我的源文档由事件元素组成，事件元素包含p元素中的散文和bibl元素中的书目记录，两者都包含对人的引用。我希望能够计算出在整个文档中，两个人一起出现在事件中的频率。我一直在使用XSLT2.0，但可以切换到3.0

例如，南希·德鲁（Nancy Drew）和迪克·特蕾西（Dick Tracy）一起参加以下活动的次数，我如何才能得到答案3？还是给迪克·特蕾西和萨姆·斯派德一个

<listEvent>
        <event xml:id="e1">
           <p>pretium eget erat eu cursus. Duis pulvinar lectus sed quam vehicula tincidunt in
              vel nunc. Cras convallis elementum diam. Sed nec viverra magna. Then <name
                 SameAs="detectives.xml#ND">Nancy Drew</name> solved the case. A consequat
              tortor molestie ut. Praesent lobortis ipsum sit amet bibendum consequat. </p>

           <bibl><name SameAs="detectives.xml#DT">Tracy, Dick</name>. The Mysterious Case of the
              Orange Fish. Penguin Publishing. </bibl>
           <bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Case of the Blue
              Carbuncle Penguin Publishing. </bibl>

        </event>
        <event xml:id="e2">
           <p> facilisis turpis eu, gravida enim. Mauris adipiscing magna consequat dolor
              auctor, sit amet tincidunt felis auctor. <name SameAs="detectives.xml#ND">Nancy
                 Drew</name> and <name SameAs="detectives.xml#DT">Dick Tracy</name> went into
              business together. Aliquam pharetra semper erat, at viverra tellus vestibulum
              quis. Sed facilisis convallis justo, suscipit fermentum lorem egestas nec.
              Phasellus in aliquam eros, vitae fringilla augue </p>

           <bibl><name SameAs="detectives.xml#TH">Hardy, Tom</name>. Growing Up Is Hard to Do:
              The Story of a Boy Detective. Knopf Press. </bibl>
           <bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Case of the Blue
              Carbuncle. Penguin Publishing. </bibl>
           <bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Hound of the
              Baskervilles. Arsenal Press. </bibl>

        </event>
        <event xml:id="e3">
           <p> Curabitur dapibus eu ligula sed elementum. Curabitur sit amet nisi dictum. <name
                 SameAs="detectives.xml#SS">Sam Spade</name> was the only detective in town.
              Donec cursus diam sem, astor. </p>

           <bibl><name SameAs="detectives.xml#TH">Hardy, Tom</name>. Growing Up Is Hard to Do:
              The Story of a Boy Detective. Knopf Press. </bibl>
           <bibl><name SameAs="detectives.xml#SS">Spade, Sam</name>. My Friends' Business
              Ventures. Knopf Press. </bibl>
           <bibl><name SameAs="detectives.xml#DN">Drew, Nancy</name>. Blonde and Curious.
              Arsenal Press.</bibl>

        </event>
        <event xml:id="e4">
           <p> Duis pulvinar lectus sed quam vehicula tincidunt in vel nunc. <name
                 SameAs="detectives.xml#ND">Nancy Drew</name> and <name
                 SameAs="detectives.xml#DT">Dick Tracy</name> made 110% profit that year. Cras
              convallis elementum diam. Sed nec viverra magna. A consequat tortor molestie ut.
              Praesent lobortis ipsum sit amet bibendum consequat. </p>

           <bibl><name SameAs="detectives.xml#SS">Spade, Sam</name>. My Friends' Business
              Ventures. Knopf Press. </bibl>
           <bibl><name SameAs="detectives.xml#MH">Holmes, Mycroft</name>. Sons and Brothers.
              Knopf Press. </bibl>
        </event>
     </listEvent>

。。。其中@weight值是我在计算时遇到的问题

我已经设法给每个人分配了一个节点@id。节点@id然后组成@source和@target值。第一个是Sam Spade和Dick Tracy，第二个是Sam Spade和Nancy Drew，@weight应该是它们在一个文档中同时出现的次数。我简化了我的示例，这可能会让人恼火。在我的实际源文档中，每个元素中都有一堆其他属性和值，包括每个人名的@n，因此使用select值填充@id、@sources和@target是一个简单的过程

@蒂姆，不用担心，@SameAs指向一个权威列表，因此无论文本中个人的名字如何拼写，例如露西、格雷厄姆小姐和L.福斯特夫人，在文本中都可能是同一个女人的名字，比如女孩，在她结婚之前和之后，或者像书目条目中的情况一样颠倒，它可以分解为一个人

不用担心，@SameAs指向一个权威列表

好的，XSLT依赖于XML源文档中的内容，因此在解析不同的@SameAs值之前，这里需要进行必要的计数

在我的实际源文档中，还有一堆其他属性和每个元素中的值，包括每个人名的@n

好的，既然没有，我就使用@SameAs属性，好像它是一个不同的id一样。下面实际上是一个XSLT 1.0样式表，由EXSLT集合增强：distinct函数。这只是一个草图，有一些脚手架留在里面，所以我们可以看到它是否朝着正确的方向发展

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:set="http://exslt.org/sets"
extension-element-prefixes="set">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

<xsl:key name="eventByID" match="event" use=".//name/@SameAs" />

<xsl:variable name="distinct_nodes" select="set:distinct(/listEvent/event//name/@SameAs)" />
<xsl:variable name="root" select="/" />

<xsl:template match="/">
<graph>
    <nodes>
        <xsl:for-each select="$distinct_nodes">
            <node id="{.}"/>
        </xsl:for-each>
    </nodes>
    <edges>
        <xsl:for-each select="$distinct_nodes[not(position()=last())]">
            <xsl:variable name="source" select="." />
            <xsl:variable name="pos" select="position()" />
                <xsl:for-each select="$distinct_nodes[position()>$pos]">
                    <xsl:variable name="target" select="." />
                    <xsl:variable name="common_events" select="key('eventByID', $source)[@xml:id=key('eventByID', $target)/@xml:id]" />
                    <xsl:if test="$common_events">
                        <edge source="{$source}" target="{$target}" weight="{count($common_events)}">
                        <!-- use this for test purposes -->
                            <!-- 
                            <xsl:for-each select="$common_events">
                                <event id="{@xml:id}"/>
                            </xsl:for-each>
                             -->
                        </edge>
                    </xsl:if>
                </xsl:for-each>
        </xsl:for-each>
    </edges>
</graph>
</xsl:template>
</xsl:stylesheet>

应用于示例XML，结果是：

<?xml version="1.0" encoding="utf-8"?>
<graph>
   <nodes>
      <node id="detectives.xml#ND"/>
      <node id="detectives.xml#DT"/>
      <node id="detectives.xml#SH"/>
      <node id="detectives.xml#TH"/>
      <node id="detectives.xml#SS"/>
      <node id="detectives.xml#DN"/>
      <node id="detectives.xml#MH"/>
   </nodes>
   <edges>
      <edge source="detectives.xml#ND" target="detectives.xml#DT" weight="3"/>
      <edge source="detectives.xml#ND" target="detectives.xml#SH" weight="2"/>
      <edge source="detectives.xml#ND" target="detectives.xml#TH" weight="1"/>
      <edge source="detectives.xml#ND" target="detectives.xml#SS" weight="1"/>
      <edge source="detectives.xml#ND" target="detectives.xml#MH" weight="1"/>
      <edge source="detectives.xml#DT" target="detectives.xml#SH" weight="2"/>
      <edge source="detectives.xml#DT" target="detectives.xml#TH" weight="1"/>
      <edge source="detectives.xml#DT" target="detectives.xml#SS" weight="1"/>
      <edge source="detectives.xml#DT" target="detectives.xml#MH" weight="1"/>
      <edge source="detectives.xml#SH" target="detectives.xml#TH" weight="1"/>
      <edge source="detectives.xml#TH" target="detectives.xml#SS" weight="1"/>
      <edge source="detectives.xml#TH" target="detectives.xml#DN" weight="1"/>
      <edge source="detectives.xml#SS" target="detectives.xml#DN" weight="1"/>
      <edge source="detectives.xml#SS" target="detectives.xml#MH" weight="1"/>
   </edges>
</graph>

你是说你想检查两个名字的所有可能组合吗？也许您应该发布一个示例，说明输出应该是什么样子，代码方面的。您的XML中有一个Nancy Drew和一个Drew，Nancy。你认为这些会被视为不同的名称吗？他们的@SameAs属性也不同。@michael.hor257k我喜欢你的想法。输出应如下所示：…/gefx>我以前从未尝试过EXSLT@michael.hor257k您的结果正是我要查找的，但我在尝试该工作表时，从Saxon EE 9.5.1.3 XTDE1425中得到一个致命错误：找不到名为{}distinct的匹配单参数函数，即使集合名称空间看起来很完美。我尝试了Saxon 6.5.5和Xalan，它们将完成转换，但返回空的节点和图形元素。我的诊断正确吗？我是否使用了错误的处理器？@CLKC它在和中都可以正常工作。在XSLT2.0中，我想您必须编写自己的：我想是这样的；如果使用XSLT2.0处理器，可能会有一些简化。感谢您使用XSLT2.0。我发现了我的错误，非常感谢您提供的样式表。我太傻了，不能被允许投票支持答案，但我会找到一个可以的人，因为你的答案是正确的。

<?xml version="1.0" encoding="utf-8"?>
<graph>
   <nodes>
      <node id="detectives.xml#ND"/>
      <node id="detectives.xml#DT"/>
      <node id="detectives.xml#SH"/>
      <node id="detectives.xml#TH"/>
      <node id="detectives.xml#SS"/>
      <node id="detectives.xml#DN"/>
      <node id="detectives.xml#MH"/>
   </nodes>
   <edges>
      <edge source="detectives.xml#ND" target="detectives.xml#DT" weight="3"/>
      <edge source="detectives.xml#ND" target="detectives.xml#SH" weight="2"/>
      <edge source="detectives.xml#ND" target="detectives.xml#TH" weight="1"/>
      <edge source="detectives.xml#ND" target="detectives.xml#SS" weight="1"/>
      <edge source="detectives.xml#ND" target="detectives.xml#MH" weight="1"/>
      <edge source="detectives.xml#DT" target="detectives.xml#SH" weight="2"/>
      <edge source="detectives.xml#DT" target="detectives.xml#TH" weight="1"/>
      <edge source="detectives.xml#DT" target="detectives.xml#SS" weight="1"/>
      <edge source="detectives.xml#DT" target="detectives.xml#MH" weight="1"/>
      <edge source="detectives.xml#SH" target="detectives.xml#TH" weight="1"/>
      <edge source="detectives.xml#TH" target="detectives.xml#SS" weight="1"/>
      <edge source="detectives.xml#TH" target="detectives.xml#DN" weight="1"/>
      <edge source="detectives.xml#SS" target="detectives.xml#DN" weight="1"/>
      <edge source="detectives.xml#SS" target="detectives.xml#MH" weight="1"/>
   </edges>
</graph>