用xslt替换使用查找表的子字符串
我有几个字符串包含十六进制字符串的变体(如果您愿意的话,源代码是framemaker)。因此,字符串可能看起来像 这是一个带有十六进制代码的句子,我们需要它 固定的 并且需要更改为 这是一个带有十六进制代码的句子,我们需要修复它 事实上,在一个字符串中可能有一些这样的代码,因此我正在寻找遍历文本、捕获所有十六进制代码(看起来像\x###)并用正确字符替换所有这些代码的最佳方法。我制作了一个包含所有字符的xml列表/查找表,如下所示:用xslt替换使用查找表的子字符串,xslt,xslt-2.0,Xslt,Xslt 2.0,我有几个字符串包含十六进制字符串的变体(如果您愿意的话,源代码是framemaker)。因此,字符串可能看起来像 这是一个带有十六进制代码的句子,我们需要它 固定的 并且需要更改为 这是一个带有十六进制代码的句子,我们需要修复它 事实上,在一个字符串中可能有一些这样的代码,因此我正在寻找遍历文本、捕获所有十六进制代码(看起来像\x###)并用正确字符替换所有这些代码的最佳方法。我制作了一个包含所有字符的xml列表/查找表,如下所示: <xsl:param name="reflist">
<xsl:param name="reflist">
<Code Value="\x27">'</Code>
<Code Value="\x28">(</Code>
<Code Value="\x29">)</Code>
<Code Value="\x2a">*</Code>
<Code Value="\x2b">+</Code>
<!-- much more like these... -->
</xsl:param>
”
(
)
*
+
现在我使用了一个简单的replace参数,但是有太多的字符使它变得可行
最好的方法是什么 使用
分析字符串
<xsl:template match="text()">
<xsl:analyze-string select="." regex="\\x[0-9a-f]{{2}}" flags="i">
<xsl:matching-substring>
<xsl:value-of select="$reflist/Code[@Value = .]"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
我还建议使用钥匙,例如
<xsl:param name="reflist" as="document-node()">
<xsl:document>
<Root>
<Code Value="\x27">'</Code>
<Code Value="\x28">(</Code>
<Code Value="\x29">)</Code>
<Code Value="\x2a">*</Code>
<Code Value="\x2b">+</Code>
<!-- much more like these... -->
</Root>
</xsl:document>
</xsl:param>
<xsl:key name="code-by-value" match="Code" use="@Value"/>
”
(
)
*
+
然后可以将查找改进为
<xsl:template match="text/text()">
<xsl:analyze-string select="." regex="\\x[0-9a-f]{{2}}" flags="i">
<xsl:matching-substring>
<xsl:value-of select="key('code-by-value', ., $reflist)"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
我已经花了一些时间将提出的建议变形为工作代码,输入是
<root>
<text>this is some sentence with some hex code\x27 s , and we need that \x28and this\x29 fixed.</text>
</root>
这是一个带有十六进制代码的句子,我们需要修复它。
并且完整的样式表是
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs">
<xsl:param name="reflist" as="document-node()">
<xsl:document>
<Root>
<Code Value="\x27">'</Code>
<Code Value="\x28">(</Code>
<Code Value="\x29">)</Code>
<Code Value="\x2a">*</Code>
<Code Value="\x2b">+</Code>
<!-- much more like these... -->
</Root>
</xsl:document>
</xsl:param>
<xsl:key name="code-by-value" match="Code" use="@Value"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* , node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text/text()">
<xsl:analyze-string select="." regex="\\x[0-9a-f]{{2}}" flags="i">
<xsl:matching-substring>
<xsl:value-of select="key('code-by-value', ., $reflist)"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
”
(
)
*
+
Saxon 9.4将输入转换如下:
<root>
<text>this is some sentence with some hex code' s , and we need that (and this) fixed.</text>
</root>
这是一个带有十六进制代码的句子,我们需要修复它。
一个人可以完全避免使用任何“参考表”——比如:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my" exclude-result-prefixes="my xs">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[matches(., '\\x(\d|[a-f])+')]">
<xsl:analyze-string select="." regex="\\x(\d|[a-f])+" >
<xsl:matching-substring>
<xsl:value-of select=
"codepoints-to-string(my:hex2dec(substring(.,3), 0))"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:function name="my:hex2dec" as="xs:integer">
<xsl:param name="pStr" as="xs:string"/>
<xsl:param name="pAccum" as="xs:integer"/>
<xsl:sequence select=
"if(not($pStr))
then $pAccum
else
for $char in substring($pStr, 1, 1),
$code in
if($char ge '0' and $char le '9')
then xs:integer($char)
else
string-to-codepoints($char) - string-to-codepoints('a') +10
return
my:hex2dec(substring($pStr,2), 16*$pAccum + $code)
"/>
</xsl:function>
</xsl:stylesheet>
<t>
<p>this is some sentence with some hex code\x27 s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x28 s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x29 s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2a s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2b s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2c s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2d s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2e s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2f s ,
and we need that fixed.</p>
</t>
当此转换应用于以下XML文档时:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my" exclude-result-prefixes="my xs">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[matches(., '\\x(\d|[a-f])+')]">
<xsl:analyze-string select="." regex="\\x(\d|[a-f])+" >
<xsl:matching-substring>
<xsl:value-of select=
"codepoints-to-string(my:hex2dec(substring(.,3), 0))"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<xsl:function name="my:hex2dec" as="xs:integer">
<xsl:param name="pStr" as="xs:string"/>
<xsl:param name="pAccum" as="xs:integer"/>
<xsl:sequence select=
"if(not($pStr))
then $pAccum
else
for $char in substring($pStr, 1, 1),
$code in
if($char ge '0' and $char le '9')
then xs:integer($char)
else
string-to-codepoints($char) - string-to-codepoints('a') +10
return
my:hex2dec(substring($pStr,2), 16*$pAccum + $code)
"/>
</xsl:function>
</xsl:stylesheet>
<t>
<p>this is some sentence with some hex code\x27 s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x28 s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x29 s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2a s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2b s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2c s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2d s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2e s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code\x2f s ,
and we need that fixed.</p>
</t>
这是一个带有十六进制代码的句子\x27,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子\x29,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
生成所需的正确结果:
<t>
<p>this is some sentence with some hex code' s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code( s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code) s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code* s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code+ s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code, s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code- s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code. s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code/ s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex code\x0428\x0438\x0448 s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex codeШиш s ,
and we need that fixed.</p>
</t>
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码+s的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
这是一个带有十六进制代码-s的句子,
我们需要解决这个问题
这是一个带有十六进制代码的句子。s
我们需要解决这个问题
这是一个带有十六进制代码的句子,
我们需要解决这个问题
注意事项:
<t>
<p>this is some sentence with some hex code' s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code( s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code) s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code* s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code+ s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code, s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code- s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code. s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code/ s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex code\x0428\x0438\x0448 s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex codeШиш s ,
and we need that fixed.</p>
</t>
此转换是通用的,可以正确处理任何十六进制unicode代码
例如,如果对该XML文档应用相同的转换:
<t>
<p>this is some sentence with some hex code' s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code( s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code) s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code* s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code+ s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code, s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code- s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code. s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code/ s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex code\x0428\x0438\x0448 s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex codeШиш s ,
and we need that fixed.</p>
</t>
这是一个带有十六进制代码的句子\x0428\x0438\x0448 s,
我们需要解决这个问题
生成正确的结果(包含西里尔文中保加利亚语“grill”一词):
<t>
<p>this is some sentence with some hex code' s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code( s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code) s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code* s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code+ s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code, s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code- s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code. s ,
and we need that fixed.</p>
<p>this is some sentence with some hex code/ s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex code\x0428\x0438\x0448 s ,
and we need that fixed.</p>
</t>
<t>
<p>this is some sentence with some hex codeШиш s ,
and we need that fixed.</p>
</t>
这是一个带有十六进制代码的句子,
我们需要解决这个问题
Wokoman,您可能对一个更通用的解决方案感兴趣,它根本不使用任何“参考表”。谢谢Dimitri,尽管这是一个很好的解决方案,但我不能这样使用它,因为Adobe在Framemaker中“滥用”了十六进制代码值。Adobe使用3种默认字体,每种字体都有不同的值,因此我不得不使用查找表。但再次感谢你的努力,我可能会在以后的阶段使用这种思维方式。