Php 当后代元素与祖先元素同名时，剥离特定元素内的所有元素_Php_Xml_Xpath

Php 当后代元素与祖先元素同名时，剥离特定元素内的所有元素

php xml xpath

Php 当后代元素与祖先元素同名时，剥离特定元素内的所有元素,php,xml,xpath,Php,Xml,Xpath,我使用的是PHP，我想去掉特定标记中的所有标记，只保留纯文本。我一直关注的问题是，有些子标记与父标记具有相同的名称： <corpo> <num>1.</num> <mod id="mod167"> String 1 <commas id="mod167-vir1" type="word">String 2</commas> <com id="mod166-

我使用的是PHP，我想去掉特定标记中的所有标记，只保留纯文本。我一直关注的问题是，有些子标记与父标记具有相同的名称：

<corpo>
    <num>1.</num>
    <mod id="mod167">
        String 1
        <commas id="mod167-vir1" type="word">String 2</commas>
        <com id="mod166-vir1-20090024-art13-com16.1"><num>&lt;&lt;16.</num></com>
        <rif xlink:href="urn" xlink:type="simple">String 3</rif><h:p>Something here</h:p>
        <corpo>String 4</corpo>
   </mod>
</corpo>

到目前为止，我尝试使用SimpleXML和PHP

strip_标签

，添加了我想要保留的所有标签，但它当然没有给出我期望的结果

$result = strip_tags($xml, "<corpo></corpo>";

$result=strip_标记（$xml，”；

如果将XML加载到DOM中，则可以读取

DOMNode:：$textContent

属性

$document = new DOMDocument();
$document->loadXml($xml);

var_dump($document->documentElement->textContent);

输出包含包含所有空白的文本内容

string(113) "
    1.

        String 1
        String 2
        <<16.
        String 3Something here
        String 4

"

输出：

string(58) "1. String 1 String 2 <<16. String 3Something here String 4"

<?xml version="1.0"?>
<corpo xmlns:xlink="urn:xlink" xmlns:h="urn:h">1. String 1 String 2 &lt;&lt;16. String 3Something here String 4</corpo>

输出：

string(58) "1. String 1 String 2 <<16. String 3Something here String 4"

<?xml version="1.0"?>
<corpo xmlns:xlink="urn:xlink" xmlns:h="urn:h">1. String 1 String 2 &lt;&lt;16. String 3Something here String 4</corpo>


1.字符串1字符串2 16.字符串3此处的内容字符串4

这与@ThW编写的内容非常相关，只是更多地关注SimpleXML

给定一个与您问题中的字符串

$buffer

相同或具有更多祖先的文档，下面是一个XML示例：

$xml = simplexml_load_string($buffer);

foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') as $corpo) {
    $corpo[0] = dom_import_simplexml($corpo)->textContent;
}

$xml->asXML('php://output');

其示例性输出为：

<a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">
    <b>
        <corpo>
            1.

                String 1
                String 2

                    &lt;&lt;16.

                String 3
                Something here
                String 4

        </corpo>
    </b>
</a>

然后，由于这是一个SimpleXMLElement并且您需要文本内容，因此可以通过与

$corpo

关联的DomeElement节点进行访问：

dom_import_simplexml($corpo)->textContent;

剩下的表达式

$corpo[0] = ...

只告诉您更新该SimpleXMLElement（所谓的自引用）的内容

顺便说一句，你可以在这里使用

strip\u标签（$corpo->asXML（））

，而不是

dom\u import\u simplexml（$corpo）->textContent

，但我不建议这样做，因为我不知道

strip\u标签

到底有多稳定。它至少不符合XML标准

现在，您可能还需要应用一些空白规范化，因为

preg\u replace

使用UTF-8标志非常方便，UTF-8标志是simplexmlement和DOMElement使用的字符串编码：

foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') as $corpo) {
    $text     = dom_import_simplexml($corpo)->textContent;
    $corpo[0] = preg_replace('~\s+~u', ' ', $text);
}

此变体为您提供：

<?xml version="1.0"?>
<a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">
    <b>
        <corpo> 1. String 1 String 2 &lt;&lt;16. String 3 Something here String 4 </corpo>
    </b>
</a>


:
xpath（'//corpo[not（祖先：：corpo）]）作为$corpo）{
$text=dom\u import\u simplexml（$corpo）->textContent；
$corpo[0]=preg_replace（'~\s+~u'，'$text）；
}
$xml->asXML（$xml）php://output');
如果XML无效，则删除了命名空间定义（xmlns:*）。是要读取字符串，还是确实需要创建新的xml？报告的xml是从数据库导出的复杂xml的摘录。我没有复制精确的XML，因为我认为这个问题可以用正则表达式解决，因为我只需要读取第一个corpo
标记中的所有字符串（纯文本）。使用XML，答案几乎永远不是正则表达式。如果删除名称空间定义，它将更改并破坏XML。答案可能不起作用，因为缺少信息。谢谢，尽管我仍然需要一些帮助，但你给我指明了正确的方向。你的第二种方法很好。它假设‘corpo’是根节点（正如我在文章中所说的），但实际上不是。我是通过以下内容来实现的：$xml->xpath（'//a:LeggeRegionale[@id=“urn:nir:legge:2011-08-11；11”]/a:articolato/a:articolo[@id=“art10”]”）如何相应地更改代码？这取决于实际的结构和命名空间定义。您需要调整Xpath表达式SimpleXMLElement:：xpath（）
但是它是有限的，它只能返回SimpleXMLElement实例的数组，该示例仅适用于DOM。DOMXpath:：evaluate（）的第二个参数是表达式的上下文节点。
foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') as $corpo) {
    $text     = dom_import_simplexml($corpo)->textContent;
    $corpo[0] = preg_replace('~\s+~u', ' ', $text);
}

<?xml version="1.0"?>
<a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">
    <b>
        <corpo> 1. String 1 String 2 &lt;&lt;16. String 3 Something here String 4 </corpo>
    </b>
</a>

<?php

$buffer = <<<XML
<a xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:h="ns:h">
    <b>
        <corpo>
            <num>1.</num>
            <mod id="mod167">
                String 1
                <commas id="mod167-vir1" type="word">String 2</commas>
                <com id="mod166-vir1-20090024-art13-com16.1">
                    <num>&lt;&lt;16.</num>
                </com>
                <rif xlink:href="urn" xlink:type="simple">String 3</rif>
                <h:p>Something here</h:p>
                <corpo>String 4</corpo>
            </mod>
        </corpo>
    </b>
</a>
XML;


$xml = simplexml_load_string($buffer);

foreach ($xml->xpath('//corpo[not(ancestor::corpo)]') as $corpo) {
    $text     = dom_import_simplexml($corpo)->textContent;
    $corpo[0] = preg_replace('~\s+~u', ' ', $text);
}

$xml->asXML('php://output');