Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex XSLT2.0:创建正则表达式从连续文本节点枚举章节号和描述_Regex_Xml_Xslt 2.0 - Fatal编程技术网

Regex XSLT2.0:创建正则表达式从连续文本节点枚举章节号和描述

Regex XSLT2.0:创建正则表达式从连续文本节点枚举章节号和描述,regex,xml,xslt-2.0,Regex,Xml,Xslt 2.0,我喜欢将章节号、标题和描述从XML文件提取到XML元素/属性层次结构。它们分布在不同元素的连续文本中。XML如下所示: <?xml version="1.0" encoding="utf-8"?> <root> <cell>3.1.1.17 First Section The “First appropriate” section lists things that can occur when an event happens. All of these

我喜欢将章节号、标题和描述从XML文件提取到XML元素/属性层次结构。它们分布在不同元素的连续文本中。XML如下所示:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <cell>3.1.1.17 First Section The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.
  </cell>
  <cell>3.1.1.18 Second Section This section lists things that occur under certain conditions. 3.1.1.19 Third Section This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.
  </cell>
</root>
<?xml version="1.0" encoding="utf-8"?>
<Root>
   <Desc chapter="3.1.1.17" title="First Section">The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.</Desc>
   <Desc chapter="3.1.1.18" title="Second Section">This section lists things that occur under certain conditions.</Desc>
   <Desc chapter="3.1.1.19" title="Third Section">This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.</Desc>
</Root>

3.1.1.17第一节“第一适当”节列出了事件发生时可能发生的事情。所有这些事件条件都会导致错误。
3.1.1.18第二节本节列出了在特定条件下发生的情况。3.1.1.19第三节本节列出了特定空间内发生的事件。3.2空间章节提供了其他内容的说明。另见:手册第4章“其他材料参考”。
所需的输出应如下所示:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <cell>3.1.1.17 First Section The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.
  </cell>
  <cell>3.1.1.18 Second Section This section lists things that occur under certain conditions. 3.1.1.19 Third Section This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.
  </cell>
</root>
<?xml version="1.0" encoding="utf-8"?>
<Root>
   <Desc chapter="3.1.1.17" title="First Section">The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.</Desc>
   <Desc chapter="3.1.1.18" title="Second Section">This section lists things that occur under certain conditions.</Desc>
   <Desc chapter="3.1.1.19" title="Third Section">This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.</Desc>
</Root>

“第一个适当的”部分列出了事件发生时可能发生的事情。所有这些事件条件都会导致错误。
本节列出了在特定条件下发生的情况。
本节列出了特定空间内发生的事件。3.2空间章节提供了其他内容的说明。另见:手册第4章“其他材料参考”。
到目前为止,我的XSLT是:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml" encoding="utf-8" />

  <xsl:template match="text()" />

  <xsl:template match="/root">
    <Root>
      <xsl:apply-templates select="cell" />
    </Root>
  </xsl:template>

  <xsl:template match="cell">
    <xsl:variable name="sections" as="element(Desc)*">
      <xsl:analyze-string regex="(\d+\.\d+\.\d+\.\d+)\s(.*?Section)(.*?)" select="text()">
        <xsl:matching-substring>
          <Desc chapter="{regex-group(1)}" title="{regex-group(2)}">
            <xsl:value-of select="regex-group(3)" />
          </Desc>
        </xsl:matching-substring>
      </xsl:analyze-string>
    </xsl:variable>
    <xsl:for-each select="$sections">
      <xsl:copy-of select="." />
    </xsl:for-each>
  </xsl:template>  
</xsl:stylesheet>

问题位于正则表达式的最后一部分:
(.*)
——一个非贪婪的消费表达式。不幸的是,我不能让它停在正确的位置。我试图使用
?:
(?=…)
使其在下一次
\d+\.\d+\.\d+\.\d+\.
之前停止非消费,但XSLT-2.0的正则表达式语法似乎与其他方言有所不同

我如何提取相关部分,以便在
中为每个
作为
regex组(1..3)
方便地处理它们

此外,我还对所有正则表达式标记的一个相当完整的XSLT-2.0引用感兴趣。

<xsl:template match="cell">
    <xsl:variable name="sections">
        <xsl:analyze-string regex="(\d+\.\d+\.\d+\.\d+)\s(.*?Section)" select=".">
            <xsl:matching-substring>
                <xsl:message select="concat('|', regex-group(3), '|')"/>
                <Desc chapter="{regex-group(1)}" title="{regex-group(2)}">
                    <xsl:value-of select="regex-group(3)" />
                </Desc>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <Value>
                    <xsl:value-of select="."/>
                </Value>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:variable>
    <xsl:for-each select="$sections/Desc">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:value-of select="following-sibling::Value[1]"/>
        </xsl:copy>
    </xsl:for-each>
</xsl:template>  

捕获要选择的数据和尾随文本。

看起来

<xsl:template match="cell">
    <xsl:variable name="sections">
        <xsl:analyze-string regex="(\d+\.\d+\.\d+\.\d+)\s(.*?Section)" select=".">
            <xsl:matching-substring>
                <xsl:message select="concat('|', regex-group(3), '|')"/>
                <Desc chapter="{regex-group(1)}" title="{regex-group(2)}">
                    <xsl:value-of select="regex-group(3)" />
                </Desc>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <Value>
                    <xsl:value-of select="."/>
                </Value>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:variable>
    <xsl:for-each select="$sections/Desc">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:value-of select="following-sibling::Value[1]"/>
        </xsl:copy>
    </xsl:for-each>
</xsl:template>  


捕获您要选择的数据和尾随文本。

很抱歉,我必须用JS回复,但我相信您可以简单地了解发生了什么。您的regex和replace解决方案应该是这样的

var xmlData = '<?xml version="1.0" encoding="utf-8"?>\n<root>\n  <cell>3.1.1.17 First Section The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.\n  </cell>\n  <cell>3.1.1.18 Second Section This section lists things that occur under certain conditions. 3.1.1.19 Third Section This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.\n  </cell>\n</root>',
        rex = /<cell>(?:\s*(\d+.\d+.\d+.\d+)\s+(\w+)\s+Section)(.+)\n*\s*<\/cell>/gm,
        xml = xmlData.replace(rex,'<Desc chapter="$1" title="$2 Section">$3</desc>');
console.log(xmlData);
<?xml version="1.0" encoding="utf-8"?>
<root>
  <Desc chapter="3.1.1.17" title="First Section"> The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.</desc>
  <Desc chapter="3.1.1.18" title="Second Section"> This section lists things that occur under certain conditions. 3.1.1.19 Third Section This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.</desc>
</root>
var xmlData='\n\n 3.1.1.17第一节“第一个适当的”部分列出了事件发生时可能发生的事情。所有这些事件条件都会导致错误。\n\n 3.1.1.18第二节本节列出了在某些条件下发生的情况。3.1.1.19第三节本节列出了特定空间内发生的事件。3.2空间章节提供了其他内容的说明。另请参见:手册第4章“其他资料参考”。\n\n',
rex=/(?:\s*(\d+。\d+。\d+。\d+)\s+(\w+)\s+节)(。+)\n*\s*/gm,
xml=xmlData.replace(rex,$3');
console.log(xmlData);
“第一个适当的”部分列出了事件发生时可能发生的事情。所有这些事件条件都会导致错误。
本节列出了在特定条件下发生的情况。3.1.1.19第三节本节列出了特定空间内发生的事件。3.2空间章节提供了其他内容的说明。另见:手册第4章“其他材料参考”。

很抱歉,我必须用JS回复,但我相信您可以简单地了解发生了什么。您的regex和replace解决方案应该是这样的

var xmlData = '<?xml version="1.0" encoding="utf-8"?>\n<root>\n  <cell>3.1.1.17 First Section The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.\n  </cell>\n  <cell>3.1.1.18 Second Section This section lists things that occur under certain conditions. 3.1.1.19 Third Section This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.\n  </cell>\n</root>',
        rex = /<cell>(?:\s*(\d+.\d+.\d+.\d+)\s+(\w+)\s+Section)(.+)\n*\s*<\/cell>/gm,
        xml = xmlData.replace(rex,'<Desc chapter="$1" title="$2 Section">$3</desc>');
console.log(xmlData);
<?xml version="1.0" encoding="utf-8"?>
<root>
  <Desc chapter="3.1.1.17" title="First Section"> The “First appropriate” section lists things that can occur when an event happens. All of these event conditions result in an error.</desc>
  <Desc chapter="3.1.1.18" title="Second Section"> This section lists things that occur under certain conditions. 3.1.1.19 Third Section This section lists events that occur within a specific space. 3.2 SPACE chapter provides descriptions other stuff. See also: Chapter 4, “Other Stuff Reference” in the Manual.</desc>
</root>
var xmlData='\n\n 3.1.1.17第一节“第一个适当的”部分列出了事件发生时可能发生的事情。所有这些事件条件都会导致错误。\n\n 3.1.1.18第二节本节列出了在某些条件下发生的情况。3.1.1.19第三节本节列出了特定空间内发生的事件。3.2空间章节提供了其他内容的说明。另请参见:手册第4章“其他资料参考”。\n\n',
rex=/(?:\s*(\d+。\d+。\d+。\d+)\s+(\w+)\s+节)(。+)\n*\s*/gm,
xml=xmlData.replace(rex,$3');
console.log(xmlData);
“第一个适当的”部分列出了事件发生时可能发生的事情。所有这些事件条件都会导致错误。
本节列出了在特定条件下发生的情况。3.1.1.19第三节本节列出了特定空间内发生的事件。3.2空间章节提供了其他内容的说明。另见:手册第4章“其他材料参考”。

非常感谢。使用
xsl:non-matching substring
是个好主意。非常感谢。使用
xsl:non-matching substring
是个好主意。很抱歉,我必须用JS回复。不,你真的不必用JS回复。如果你真的很抱歉,那么首先不要回复(或者现在就删除你的答案)。使用正则表达式解析XML是非常困难的。通过发布JS来回答XSLT问题是没有帮助和糟糕的形式。未来读者:不要这样做。对不起,我必须用JS回复。不,你真的不必用JS回复。如果你真的很抱歉,那么首先不要回复(或者现在就删除你的答案)。使用正则表达式解析XML是非常困难的。通过发布JS来回答XSLT问题是没有帮助和糟糕的形式。未来读者:不要这样做。