Javascript PHP&；正则表达式：在XML标记之间获取原始数据，即使整个XML似乎无效_Javascript_Php_Xml_Regex_Simplexml

Javascript PHP&；正则表达式：在XML标记之间获取原始数据，即使整个XML似乎无效

javascript php xml regex

Javascript PHP&；正则表达式：在XML标记之间获取原始数据，即使整个XML似乎无效,javascript,php,xml,regex,simplexml,Javascript,Php,Xml,Regex,Simplexml,这个问题以前似乎经常被问到，但我发现没有一个有效的数据解决方案，它非常长，并且包含特殊的字符，如“您是否也生成了XML？”？因为如果是，您应该将文本数据放在CDATA之间，然后使用simplexml或您选择的解析器加载xml，并获取文本标记内容。确保您没有UTF-8字符或XML中根本不允许的某些字符：否则，您可以这样做： preg_match('#<text>(.+?)</text>#is', $xml, $matches); echo $matches[1]; /

这个问题以前似乎经常被问到，但我发现没有一个有效的数据解决方案，它非常长，并且包含特殊的字符，如“您是否也生成了XML？”？因为如果是，您应该将文本数据放在CDATA之间，然后使用simplexml或您选择的解析器加载xml，并获取文本标记内容。确保您没有UTF-8字符或XML中根本不允许的某些字符：

否则，您可以这样做：

preg_match('#<text>(.+?)</text>#is', $xml, $matches);
echo $matches[1]; // your data between <text> and </text>

preg#u match（'#（.+？）#is'，$xml，$matches）；
echo$匹配[1]；//您的数据介于和之间

您是否也在生成XML？因为如果是，您应该将文本数据放在CDATA之间，然后使用simplexml或您选择的解析器加载xml，并获取文本标记内容。确保您没有UTF-8字符或XML中根本不允许的某些字符：

否则，您可以这样做：

preg_match('#<text>(.+?)</text>#is', $xml, $matches);
echo $matches[1]; // your data between <text> and </text>

preg#u match（'#（.+？）#is'，$xml，$matches）；
echo$匹配[1]；//您的数据介于和之间

首先，您原来的正则表达式模式还可以，应该可以正常工作：

#<".$item_name."[^>]*>([\s\S]*?)</".$item_name.">#

原始正则表达式意味着您希望在开头的

文本标记中接收额外字符。开头标记中的*？
允许这样做-开头标记中的？
使其在第一个
处停止
正则表达式3
使其更具动态性。您可以将单词文本
替换为变量（根据您的原始版本）。将其与捕获组和引用混合在一起，如Regex 3中所述，您只需插入一次变量，即可获得更清晰易读的代码
对比与原件
#（.*）#是
#]*>（[\s\s]*？）#

工作示例
使用上面的正则表达式4
$string = "
<root><id>1</id><text>Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

</text></root>";

preg_match('#<('.$item_name.').*?>(.*)</\1>#is', $string, $matches);
var_dump($matches);

/**
Output:

array(3) {
  [0]=>
  string(167) "<text>Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

</text>"
  [1]=>
  string(4) "text"
  [2]=>
  string(154) "Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

"
}

*/

$string=”
这里有一段很长的文字，带有
换行符、空格和许多非常不寻常的字符，例如<%&}
文本长度可以超过5000个字符
";
preg#u match（'#（.*）#is'，$string，$matches）；
var_dump（$matches）；
/**
输出：
阵列（3）{
[0]=>
字符串（167）“这是一个非常长的文本，带有
换行符、空格和许多非常不寻常的字符，例如<%&}
文本长度可以超过5000个字符
"
[1]=>
字符串（4）“文本”
[2]=>
字符串（154）“这是一个非常长的文本，带有
换行符、空格和许多非常不寻常的字符，例如<%&}
文本长度可以超过5000个字符
"
}
*/

注意：如果您无法让上述工作示例…工作…那么您是否可以（通过编辑问题或链接）提供一个不工作的示例案例？
首先，您原来的正则表达式模式可以，应该可以正常工作：
#<".$item_name."[^>]*>([\s\S]*?)</".$item_name.">#

原始正则表达式意味着您希望在开头的文本标记中接收额外字符。开头标记中的*？
允许这样做-开头标记中的？
使其在第一个
处停止
正则表达式3
使其更具动态性。您可以将单词文本
替换为变量（根据您的原始版本）。将其与捕获组和引用混合在一起，如Regex 3中所述，您只需插入一次变量，即可获得更清晰易读的代码
对比与原件
#（.*）#是
#]*>（[\s\s]*？）#

工作示例
使用上面的正则表达式4
$string = "
<root><id>1</id><text>Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

</text></root>";

preg_match('#<('.$item_name.').*?>(.*)</\1>#is', $string, $matches);
var_dump($matches);

/**
Output:

array(3) {
  [0]=>
  string(167) "<text>Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

</text>"
  [1]=>
  string(4) "text"
  [2]=>
  string(154) "Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

"
}

*/

$string=”
这里有一段很长的文字，带有
换行符、空格和许多非常不寻常的字符，例如<%&}
文本长度可以超过5000个字符
";
preg#u match（'#（.*）#is'，$string，$matches）；
var_dump（$matches）；
/**
输出：
阵列（3）{
[0]=>
字符串（167）“这是一个非常长的文本，带有
换行符、空格和许多非常不寻常的字符，例如<%&}
文本长度可以超过5000个字符
"
[1]=>
字符串（4）“文本”
[2]=>
字符串（154）“这是一个非常长的文本，带有
换行符、空格和许多非常不寻常的字符，例如<%&}
文本长度可以超过5000个字符
"
}
*/

注意：如果您无法让上述工作示例…工作…那么您是否可以（通过编辑问题或链接）提供一个不工作的示例案例？
我尝试了您的第一次尝试，它对我有效。您可以使用strpos（）代替regexp
查找开始标记和结束标记，然后substr（）
获取它们之间的文本。我用您在问题顶部发布的示例数据进行了尝试。失败需要多大的代价？顺便说一句，而不是[\s\s]
，使用
并将s
修饰符添加到regexp中，以便
匹配换行符。如下？：#]*>。#/s上面的示例数据只是为了说明xml是如何生成的。我尝试了您的第一次尝试，它对我有效。您可以使用strps（）代替regexp
查找开始标记和结束标记，然后substr（）
获取它们之间的文本。我用您在问题顶部发布的示例数据进行了尝试。失败需要多大的代价？顺便说一句，而不是[\s\s]
，使用
并将s
修饰符添加到regexp中，以便匹配换行符。像这样？：#]*>。#/s上面的示例数据只是为了说明xml是如何生成的。
#<text.*?>(.*)</text>#is

#<(text).*?>(.*)</\1>#is

#<('.$item_name.').*?>(.*)</\1>#is

#<('.$item_name.').*?>(.*)</\1>#is
#<".$item_name."[^>]*>([\s\S]*?)</".$item_name.">#

$string = "
<root><id>1</id><text>Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

</text></root>";

preg_match('#<('.$item_name.').*?>(.*)</\1>#is', $string, $matches);
var_dump($matches);

/**
Output:

array(3) {
  [0]=>
  string(167) "<text>Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

</text>"
  [1]=>
  string(4) "text"
  [2]=>
  string(154) "Here is a very long text with

line breaks, white-spaces and many very unsual charchaters, e.g. < % & }

the text can be more then 5000 characters long 

"
}

*/