Xml XSLT统计特定子元素的共现次数
我试图统计xml文档中事件记录中特定人员的共同出现次数。我的源文档由事件元素组成,事件元素包含p元素中的散文和bibl元素中的书目记录,两者都包含对人的引用。我希望能够计算出在整个文档中,两个人一起出现在事件中的频率。我一直在使用XSLT2.0,但可以切换到3.0 例如,南希·德鲁(Nancy Drew)和迪克·特蕾西(Dick Tracy)一起参加以下活动的次数,我如何才能得到答案3?还是给迪克·特蕾西和萨姆·斯派德一个Xml XSLT统计特定子元素的共现次数,xml,xslt,Xml,Xslt,我试图统计xml文档中事件记录中特定人员的共同出现次数。我的源文档由事件元素组成,事件元素包含p元素中的散文和bibl元素中的书目记录,两者都包含对人的引用。我希望能够计算出在整个文档中,两个人一起出现在事件中的频率。我一直在使用XSLT2.0,但可以切换到3.0 例如,南希·德鲁(Nancy Drew)和迪克·特蕾西(Dick Tracy)一起参加以下活动的次数,我如何才能得到答案3?还是给迪克·特蕾西和萨姆·斯派德一个 <listEvent> <event
<listEvent>
<event xml:id="e1">
<p>pretium eget erat eu cursus. Duis pulvinar lectus sed quam vehicula tincidunt in
vel nunc. Cras convallis elementum diam. Sed nec viverra magna. Then <name
SameAs="detectives.xml#ND">Nancy Drew</name> solved the case. A consequat
tortor molestie ut. Praesent lobortis ipsum sit amet bibendum consequat. </p>
<bibl><name SameAs="detectives.xml#DT">Tracy, Dick</name>. The Mysterious Case of the
Orange Fish. Penguin Publishing. </bibl>
<bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Case of the Blue
Carbuncle Penguin Publishing. </bibl>
</event>
<event xml:id="e2">
<p> facilisis turpis eu, gravida enim. Mauris adipiscing magna consequat dolor
auctor, sit amet tincidunt felis auctor. <name SameAs="detectives.xml#ND">Nancy
Drew</name> and <name SameAs="detectives.xml#DT">Dick Tracy</name> went into
business together. Aliquam pharetra semper erat, at viverra tellus vestibulum
quis. Sed facilisis convallis justo, suscipit fermentum lorem egestas nec.
Phasellus in aliquam eros, vitae fringilla augue </p>
<bibl><name SameAs="detectives.xml#TH">Hardy, Tom</name>. Growing Up Is Hard to Do:
The Story of a Boy Detective. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Case of the Blue
Carbuncle. Penguin Publishing. </bibl>
<bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Hound of the
Baskervilles. Arsenal Press. </bibl>
</event>
<event xml:id="e3">
<p> Curabitur dapibus eu ligula sed elementum. Curabitur sit amet nisi dictum. <name
SameAs="detectives.xml#SS">Sam Spade</name> was the only detective in town.
Donec cursus diam sem, astor. </p>
<bibl><name SameAs="detectives.xml#TH">Hardy, Tom</name>. Growing Up Is Hard to Do:
The Story of a Boy Detective. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#SS">Spade, Sam</name>. My Friends' Business
Ventures. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#DN">Drew, Nancy</name>. Blonde and Curious.
Arsenal Press.</bibl>
</event>
<event xml:id="e4">
<p> Duis pulvinar lectus sed quam vehicula tincidunt in vel nunc. <name
SameAs="detectives.xml#ND">Nancy Drew</name> and <name
SameAs="detectives.xml#DT">Dick Tracy</name> made 110% profit that year. Cras
convallis elementum diam. Sed nec viverra magna. A consequat tortor molestie ut.
Praesent lobortis ipsum sit amet bibendum consequat. </p>
<bibl><name SameAs="detectives.xml#SS">Spade, Sam</name>. My Friends' Business
Ventures. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#MH">Holmes, Mycroft</name>. Sons and Brothers.
Knopf Press. </bibl>
</event>
</listEvent>
。。。其中@weight值是我在计算时遇到的问题
我已经设法给每个人分配了一个节点@id。节点@id然后组成@source和@target值。第一个是Sam Spade和Dick Tracy,第二个是Sam Spade和Nancy Drew,@weight应该是它们在一个文档中同时出现的次数。我简化了我的示例,这可能会让人恼火。在我的实际源文档中,每个元素中都有一堆其他属性和值,包括每个人名的@n,因此使用select值填充@id、@sources和@target是一个简单的过程
@蒂姆,不用担心,@SameAs指向一个权威列表,因此无论文本中个人的名字如何拼写,例如露西、格雷厄姆小姐和L.福斯特夫人,在文本中都可能是同一个女人的名字,比如女孩,在她结婚之前和之后,或者像书目条目中的情况一样颠倒,它可以分解为一个人
不用担心,@SameAs指向一个权威列表
好的,XSLT依赖于XML源文档中的内容,因此在解析不同的@SameAs值之前,这里需要进行必要的计数
在我的实际源文档中,还有一堆其他属性和
每个元素中的值,包括每个人名的@n
好的,既然没有,我就使用@SameAs属性,好像它是一个不同的id一样。下面实际上是一个XSLT 1.0样式表,由EXSLT集合增强:distinct函数。这只是一个草图,有一些脚手架留在里面,所以我们可以看到它是否朝着正确的方向发展
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:set="http://exslt.org/sets"
extension-element-prefixes="set">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:key name="eventByID" match="event" use=".//name/@SameAs" />
<xsl:variable name="distinct_nodes" select="set:distinct(/listEvent/event//name/@SameAs)" />
<xsl:variable name="root" select="/" />
<xsl:template match="/">
<graph>
<nodes>
<xsl:for-each select="$distinct_nodes">
<node id="{.}"/>
</xsl:for-each>
</nodes>
<edges>
<xsl:for-each select="$distinct_nodes[not(position()=last())]">
<xsl:variable name="source" select="." />
<xsl:variable name="pos" select="position()" />
<xsl:for-each select="$distinct_nodes[position()>$pos]">
<xsl:variable name="target" select="." />
<xsl:variable name="common_events" select="key('eventByID', $source)[@xml:id=key('eventByID', $target)/@xml:id]" />
<xsl:if test="$common_events">
<edge source="{$source}" target="{$target}" weight="{count($common_events)}">
<!-- use this for test purposes -->
<!--
<xsl:for-each select="$common_events">
<event id="{@xml:id}"/>
</xsl:for-each>
-->
</edge>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</edges>
</graph>
</xsl:template>
</xsl:stylesheet>
应用于示例XML,结果是:
<?xml version="1.0" encoding="utf-8"?>
<graph>
<nodes>
<node id="detectives.xml#ND"/>
<node id="detectives.xml#DT"/>
<node id="detectives.xml#SH"/>
<node id="detectives.xml#TH"/>
<node id="detectives.xml#SS"/>
<node id="detectives.xml#DN"/>
<node id="detectives.xml#MH"/>
</nodes>
<edges>
<edge source="detectives.xml#ND" target="detectives.xml#DT" weight="3"/>
<edge source="detectives.xml#ND" target="detectives.xml#SH" weight="2"/>
<edge source="detectives.xml#ND" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#ND" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#ND" target="detectives.xml#MH" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#SH" weight="2"/>
<edge source="detectives.xml#DT" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#MH" weight="1"/>
<edge source="detectives.xml#SH" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#TH" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#TH" target="detectives.xml#DN" weight="1"/>
<edge source="detectives.xml#SS" target="detectives.xml#DN" weight="1"/>
<edge source="detectives.xml#SS" target="detectives.xml#MH" weight="1"/>
</edges>
</graph>
你是说你想检查两个名字的所有可能组合吗?也许您应该发布一个示例,说明输出应该是什么样子,代码方面的。您的XML中有一个Nancy Drew和一个Drew,Nancy。你认为这些会被视为不同的名称吗?他们的@SameAs属性也不同。@michael.hor257k我喜欢你的想法。输出应如下所示:…/gefx>我以前从未尝试过EXSLT@michael.hor257k您的结果正是我要查找的,但我在尝试该工作表时,从Saxon EE 9.5.1.3 XTDE1425中得到一个致命错误:找不到名为{}distinct的匹配单参数函数,即使集合名称空间看起来很完美。我尝试了Saxon 6.5.5和Xalan,它们将完成转换,但返回空的节点和图形元素。我的诊断正确吗?我是否使用了错误的处理器?@CLKC它在和中都可以正常工作。在XSLT2.0中,我想您必须编写自己的:我想是这样的;如果使用XSLT2.0处理器,可能会有一些简化。感谢您使用XSLT2.0。我发现了我的错误,非常感谢您提供的样式表。我太傻了,不能被允许投票支持答案,但我会找到一个可以的人,因为你的答案是正确的。
<?xml version="1.0" encoding="utf-8"?>
<graph>
<nodes>
<node id="detectives.xml#ND"/>
<node id="detectives.xml#DT"/>
<node id="detectives.xml#SH"/>
<node id="detectives.xml#TH"/>
<node id="detectives.xml#SS"/>
<node id="detectives.xml#DN"/>
<node id="detectives.xml#MH"/>
</nodes>
<edges>
<edge source="detectives.xml#ND" target="detectives.xml#DT" weight="3"/>
<edge source="detectives.xml#ND" target="detectives.xml#SH" weight="2"/>
<edge source="detectives.xml#ND" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#ND" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#ND" target="detectives.xml#MH" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#SH" weight="2"/>
<edge source="detectives.xml#DT" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#MH" weight="1"/>
<edge source="detectives.xml#SH" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#TH" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#TH" target="detectives.xml#DN" weight="1"/>
<edge source="detectives.xml#SS" target="detectives.xml#DN" weight="1"/>
<edge source="detectives.xml#SS" target="detectives.xml#MH" weight="1"/>
</edges>
</graph>