Php 正在尝试从字符串中删除HTML标记（+；内容）_Php_Regex

Php 正在尝试从字符串中删除HTML标记（+；内容）

php regex

Php 正在尝试从字符串中删除HTML标记（+；内容）,php,regex,Php,Regex,好的，基本上我要用这个把我的头撞到墙上代码如下： <?php $s = "385,178<ref name=\"land area\">Data is accessible by following \"Create tables and diagrams\" link on the following site, and then using table 09280 \"Area of land and fresh water (kmÂ²) (M)\" for \"The

好的，基本上我要用这个把我的头撞到墙上

代码如下：

<?php

$s = "385,178<ref name=\"land area\">Data is accessible by following \"Create tables and diagrams\" link on the following site, and then using table 09280 \"Area of land and fresh water (kmÂ²) (M)\" for \"The whole country\" in year 2013 and summing up entries \"Land area\" and \"Freshwater\": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref>";

function removeHTMLTags($str) { 
    $r = '/(\\<br\\>)|(\\<br\/\\>)|(\\<(.+?)(\\s*[^\\<]+)?\\>(.+)?\\<\\\\\/\\1\\>)|(\\<ref\\sname=([^\\<]+?)\/\\>)/';

    echo "Preg_matching : $str\n\n";
    echo "Regex : $r\n\n";

    return preg_replace($r,'',$str); 
}

echo removeHTMLTags($s);

?>

我想做的是，基本上摆脱

你真的应该使用DOM来处理这类东西，因为其他解决方案往往很容易崩溃：
$dom = new DOMDOcument();
$errorState = libxml_use_internal_errors(true);
$dom->loadHTML($s);

$xpath = new DOMXPath($dom);
$node = $xpath->query('//body/p/text()')->item(0);
echo $node->textContent;

libxml_use_internal_errors($errorState);

最终结果应该是什么？另外，为什么不简单地使用strip\u tags（）
？那不符合你的要求吗？如果没有，为什么？你不应该用正则表达式玩HTML。那么您面临的问题是什么呢？@AmalMurali初始字符串（$s
）中没有任何标记（+tag content）。@ShankarDamodaran那么，strip\u tags（）
不是让标记内容保持原样吗？（这不是我想要的…@Dr.Kameleon:那只是385178？我决定接受你的答案是正确的，因为它似乎效果更好。然而，仍然存在一些问题。请看这里：
$dom = new DOMDOcument();
$errorState = libxml_use_internal_errors(true);
$dom->loadHTML($s);

$xpath = new DOMXPath($dom);
$node = $xpath->query('//body/p/text()')->item(0);
echo $node->textContent;

libxml_use_internal_errors($errorState);