Php 截断包含HTML标记的字符串_Php_Html_String_Tags_Php 5.3

Php 截断包含HTML标记的字符串

php html string tags

Php 截断包含HTML标记的字符串,php,html,string,tags,php-5.3,Php,Html,String,Tags,Php 5.3,我有一个包含HTML标记的字符串。我正在寻找一段代码，可以将此字符串截断为：长度为100个字符不包含图像标记（）包括其他HTML标记（图像标记除外） 100个字符的长度不应包括空格和HTML标记字符例如，字符串为： <img>Something</img><b>Just an Example</b> Plain Text <br><a href="#">stackoverflow</a> somet

我有一个包含HTML标记的字符串。我正在寻找一段代码，可以将此字符串截断为：

长度为100个字符
不包含图像标记（
）
包括其他HTML标记（图像标记除外）
100个字符的长度不应包括空格和HTML标记字符

例如，字符串为：

<img>Something</img><b>Just an Example</b> Plain Text <br><a href="#">stackoverflow</a>

something只是一个纯文本示例

因此，结果应该是：

只是一个例子纯文本堆栈溢出（它是一个链接）

因此，我们有大约35个单词（除了空格）

我尝试了来自的解决方案，但没有得到所需的结果。任何帮助都将不胜感激。

功能如何。这是我的--

AbstractHTMLContents

。它有两个参数：

输入HTML内容
限制

代码如下：

function AbstractHTMLContents($html, $maxLength=100){
    mb_internal_encoding("UTF-8");
    $printedLength = 0;
    $position = 0;
    $tags = array();
    $newContent = '';

    $html = $content = preg_replace("/<img[^>]+\>/i", "", $html);

    while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
    {
        list($tag, $tagPosition) = $match[0];
        // Print text leading up to the tag.
        $str = mb_strcut($html, $position, $tagPosition - $position);
        if ($printedLength + mb_strlen($str) > $maxLength){
            $newstr = mb_strcut($str, 0, $maxLength - $printedLength);
            $newstr = preg_replace('~\s+\S+$~', '', $newstr);  
            $newContent .= $newstr;
            $printedLength = $maxLength;
            break;
        }
        $newContent .= $str;
        $printedLength += mb_strlen($str);
        if ($tag[0] == '&') {
            // Handle the entity.
            $newContent .= $tag;
            $printedLength++;
        } else {
            // Handle the tag.
            $tagName = $match[1][0];
            if ($tag[1] == '/') {
              // This is a closing tag.
              $openingTag = array_pop($tags);
              assert($openingTag == $tagName); // check that tags are properly nested.
              $newContent .= $tag;
            } else if ($tag[mb_strlen($tag) - 2] == '/'){
          // Self-closing tag.
            $newContent .= $tag;
        } else {
          // Opening tag.
          $newContent .= $tag;
          $tags[] = $tagName;
        }
      }

      // Continue after the tag.
      $position = $tagPosition + mb_strlen($tag);
    }

    // Print any remaining text.
    if ($printedLength < $maxLength && $position < mb_strlen($html))
      {
        $newstr = mb_strcut($html, $position, $maxLength - $printedLength);
        $newstr = preg_replace('~\s+\S+$~', '', $newstr);
        $newContent .= $newstr;
      }

    // Close any open tags.
    while (!empty($tags))
      {
        $newContent .= sprintf('</%s>', array_pop($tags));
      }

    return $newContent;
}

function abstractHtmlContent（$html，$maxLength=100）{
mb_内部_编码（“UTF-8”）；
$printedLength=0；
$position=0；
$tags=array（）；
$newContent=''；
$html=$content=preg\u replace（“/]+\>/i”，”，$html）；
而（$printedLength<$maxLength&&preg#u match（'{]*>|&#？[a-zA-Z0-9]+}'，$html，$match，preg#u OFFSET#u CAPTURE，$position））
{
列表（$tag，$tagPosition）=$match[0]；
//打印指向标记的文本。
$str=mb_strct（$html、$position、$tagPosition-$position）；
如果（$printedLength+mb_strlen（$str）>$maxLength）{
$newstr=mb\u strut（$str，0，$maxLength-$printedLength）；
$newstr=preg_replace（“~\s+\s+$~”，“$newstr”）；
$newContent.=$newstr；
$printedLength=$maxLength；
打破
}
$newContent.=$str；
$printedLength+=mb_strlen（$str）；
如果（$tag[0]='&'））{
//处理实体。
$newContent.=$tag；
$printedLength++；
}否则{
//处理标签。
$tagName=$match[1][0]；
如果（$tag[1]=='/'））{
//这是一个结束标记。
$openingTag=array\u pop（$tags）；
assert（$openingTag==$tagName）；//检查标记是否正确嵌套。
$newContent.=$tag；
}else if（$tag[mb_strlen（$tag）-2]='/'））{
//自动关闭标签。
$newContent.=$tag；
}否则{
//开场白。
$newContent.=$tag；
$tags[]=$tagName；
}
}
//在标记之后继续。
$position=$tagPosition+mb_strlen（$tag）；
}
//打印任何剩余文本。
如果（$printedLength<$maxLength&&$position


看起来，它给出了您所期望的结果。
您是否已经尝试了一些PHP模板模块/框架来帮助您实现这一点？结果与您所要求的有什么不同？@Herbert-我得到的字符串也计算了html标记和空格的总字长。因此，当我只截断100个单词的字符串时，我得到了80个单词，剩余的是空白和html标记。请回答这个问题给我一些提示。谢谢。@GabrielGartz-我的项目是在Symfony框架上进行的，它有帮助吗？试试这个：这是我见过的最好的解决这个问题的方法之一。干得好！重复问题的重复答案：