Php 删除所有出现的<；br>；课文前_Php_Regex_Xpath

Php 删除所有出现的<；br>；课文前

php regex xpath

Php 删除所有出现的<；br>；课文前,php,regex,xpath,Php,Regex,Xpath,我正在尝试删除文本之前的所有所以我有这个： <p> <br/><br/>When the battle is on between contestants in a talent show, it gets really competitive when down to the last four. X-FactorUSAcontestant Marcus Canty knows this all too well as this is the stag

我正在尝试删除文本之前的所有

所以我有这个：

<p>
 <br/><br/>When the battle is on between contestants in a talent show, it gets really competitive when down to the last four.  X-FactorUSAcontestant Marcus Canty knows this all too well as this is the stage he was voted off of the show earlier this year. <br/><br/>
</p>

由于某些原因，在这一页上，它似乎不起作用

更新

最初的问题是，为什么在文档的开头有
，而我的xpath应该去掉它们（见下文）。我应用了一些正则表达式，看看这是否有效，从而显示了您现在看到的doctype。我以为doctype是我最初的问题的原因，但直到现在才显示出来。这是我从blogger导入的内容，目前正在处理以适应新的博客

！DOCTYPE html PUBLIC“-//W3C//DTD html 4.0//EN”http://www.w3.org/TR/REC-html40/loose.dtd“>

这是我的密码：

global $post;
$postTime = $post - > post_date;
$postTime = strtotime($postTime);
$startDate = "2014/01/16";
if ($postTime < strtotime($startDate)) {
    $html = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
    $doc = new DOMDocument();@$doc - > loadHTML($html);
    $xpath = new DOMXPath($doc);
    foreach($xpath - > query('//br[not(preceding::text())]') as $node) {
        $node - > parentNode - > removeChild($node);
    }
    $nodes = $xpath - > query('//a[string-length(.) = 0]');
    foreach($nodes as $node) {
        $node - > parentNode - > removeChild($node);
    }
    $nodes = $xpath - > query('//*[not(text() or node() or self::br)]');
    foreach($nodes as $node) {
        $node - > parentNode - > removeChild($node);
    }
    remove_filter('the_content', 'wpautop');
    $content = $doc - > saveHTML();
    $content = ltrim($content, '<br>');
    $content = strip_tags($content, '<br> <a> <iframe>');
    $content = preg_replace(array('/(<br\s*\/?>\s*){1,}/'), array('<br/><br/>'), $content);
    $content = str_replace('&nbsp;', ' ', $content);
    $content = "<p>".implode("</p>\n\n<p>", preg_split('/\n(?:\s*\n)+/', $content))."</p>";
    return $content;

global$post；
$postTime=$post->post\U日期；
$postTime=strotime（$postTime）；
$startDate=“2014/01/16”；
如果（$postTimeloadHTML（$html）；
$xpath=新的DOMXPath（$doc）；
foreach（$xpath->query（'//br[not（preference:：text（））]'）作为$node）{
$node->parentNode->removeChild（$node）；
}
$nodes=$xpath->query（'//a[string length（.）=0]'）；
foreach（$node作为$node）{
$node->parentNode->removeChild（$node）；
}
$nodes=$xpath->query（'/*[not（text（）或node（）或self:：br）]）；
foreach（$node作为$node）{
$node->parentNode->removeChild（$node）；
}
移除_过滤器（“_内容”、“wpautop”）；
$content=$doc->saveHTML（）；
$content=ltrim（$content，
）；
$content=strip_标签（$content，
）；
$content=preg_replace（数组（'/（\s*）{1，}/'）、数组（'

'）、$content）；
$content=str_replace（“”，，$content）；
$content=“”。内爆（
\n\n”，预裂（'/\n（？:\s*\n）+/'，$content））。“”；
返回$content；

感谢您的帮助。

您可以尝试使用正则表达式

s/!DOCTYPE html PUBLIC “-\/\/W3C\/\/DTD HTML 4.0 Transitional\/\/EN” “http:\/\/www.w3.org\/TR\/REC-html40\/loose.dtd”>((<br[^>]*/>)+)(.*)/\3/

s/！DOCTYPE html PUBLIC“-\/\/W3C\/\/DTD html 4.0 Transitional\/\/EN”http:\/\/www.w3.org\/TR\/REC-html40\/loose.DTD“>（]*/>）+（.*）/\3/

或在PHP中：

$pattern = '/^((<br[^>]*/>)+)(.*)/i';
$replacement = '$3';
$content = preg_replace($pattern, $replacement, $content);

$pattern='/^（]*/>）+（.*）/i'；
$replacement='$3'；
$content=preg_replace（$pattern，$replacement，$content）；

ltrim呢

$string = ltrim($string, '<br/>');

$string=ltrim（$string，
）；

谢谢，但是出现了另一个问题，请查看更新的问题。你说的“包装”实际上是什么意思？你是说

标记实际上是

？我的意思是现在我得到了

！DOCTYPE html PUBLIC“；-//W3C//DTD html 4.0过渡版//EN&&&”；“；http://www.w3.org/TR/REC-html40/loose.dtd”；>

对，这就是我的页面上正在输出的内容，如果你看看我正在做的xpath东西，我的doctype正在添加到内容中，我不知道为什么或者如何删除它。这就是为什么它在我的页面上被回送。你也可以通过将它插入正则表达式来删除它。不过我很好奇ng为什么doctype会打印在内容中。我不知道你介绍的算法是如何做到这一点的，所以提供原始内容的服务器可能有问题。我想现在正则表达式可以工作了：*/>实际上声明，

之间的任何内容都会被忽略…@jurgemaister哦，看，这是正则表达式post Everyone链接当任何人在做任何事情时，正则表达式…不是正则表达式。只有正则表达式和HTML。想知道为什么…@jurgemaister那么你是说，是我的正则表达式还是我的xpath导致我的doctype打印在页面上？

$string = ltrim($string, '<br/>');