PHP在主体内获取标记并删除每个标记内的文本内容_Php_Regex

PHP在主体内获取标记并删除每个标记内的文本内容

php regex

PHP在主体内获取标记并删除每个标记内的文本内容,php,regex,Php,Regex,我想抓住体内所有的东西 <html> <head><title>Test</title> </head> <body> <div id="dummy">Your contents</div> <p class="p">Paragraph</p> <div id="example">My Content</div> </body> </

我想抓住体内所有的东西

<html>
<head><title>Test</title>
</head>

<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>

我想要的最终结果是：

<div id="dummy"></div>
<p class="p"></p>
<div id="example"></div>

不是这样的：

<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>

尽管这会起作用：

if (preg_match('%<(body)[^>]*>(.*)<\s*/\1\s*>%s', $subject, $regs)) {
    $result = $regs[2];
}

我不推荐。使用php，您可以使用更好的工具来完成这项工作。例如，使用解析器：

编辑：

既然你坚持

$result = preg_replace('%(<(div)[^>]*>).*<\s*/\2\s*>%', '\1</\2>', $subject);

这将删除div标记的内容。您还可以将div标记与其他标记交换。虽然我真的不知道你在哪里处理这个问题，但我不推荐它。

为什么它必须与正则表达式？你应该考虑删除正则表达式，并用解析器去你应该走的路线。例如，PHP的DOMDOcument可能与@oknoorap重复，您应该明确提到要删除每个标记中的文本内容。这完全改变了问题。你的问题的每个答案现在都不正确，将被否决……问题是，我们三个人在你编辑之前发布了一个解决方案，假设正文内容应该提取不变，现在将被否决。请在以后提问时更加清楚，以免误解。谢谢，但我不想使用dom，我想使用preg match或preg replace。这假设HTML代码具有正确的语法。可悲的事实是，对于任意网页，您根本无法保证这一点，即使它们在大多数浏览器中呈现正确=正如网页作者所假设的那样。不，我的意思是，我的问题只是一个例子，没有我的工作那么复杂。@oknoorap我确实提供了两种解决方案。@FailedDev是，但是我想要的最终结果是标签中的干净内容。htmlspecialchars只显示结果。我想知道为什么我被否决了。这可能不是最好的方法，但显然这是奥克努拉普要求的。。。

$content = '<html>
<head><title>Test</title>
</head>

<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>';

preg_match('/(?:<body[^>]*>)(.*)<\/body>/isU', $content, $matches);
$bodycontent = $matches[1];
echo htmlspecialchars($bodycontent);
preg_match_all('/<[^>]*>/isU', $bodycontent, $matches2);
$tags = implode("",$matches2[0]);
echo htmlspecialchars($tags);

$content = '<html>
<head><title>Test</title>
</head>

<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>';

preg_match('/(?:<body[^>]*>)(.*)<\/body>/isU', $content, $matches);
$bodycontent = $matches[1];
echo htmlspecialchars($bodycontent);
preg_match_all('/<[^>]*>/isU', $bodycontent, $matches2);
$tags = implode("",$matches2[0]);
echo htmlspecialchars($tags);