Regex&；PHP-从img标记中隔离src属性_Php_Regex_String

Regex&；PHP-从img标记中隔离src属性

php regex string

Regex&；PHP-从img标记中隔离src属性,php,regex,string,Php,Regex,String,使用PHP，如何将src属性的内容与$foo隔离？我寻找的最终结果只会给我“ $foo=''；尝试以下模式： '/< \s* img [^\>]* src \s* = \s* [\""\']? ( [^\""\'\s>]* )/' '/]*src\s*=\s*[\\”“\']？（[^\\\'\s>]*）/' code //从字符串创建DOM $html=str_get_html（“”）； //回显src属性 echo$html->find（'img'，0）->src；如

使用PHP，如何将src属性的内容与$foo隔离？我寻找的最终结果只会给我“

$foo=''；

尝试以下模式：

'/< \s* img [^\>]* src \s* = \s* [\""\']? ( [^\""\'\s>]* )/'

'/<\s*img[^\>]*src\s*=\s*[\\”“\']？（[^\\\'\s>]*）/'

code

//从字符串创建DOM
$html=str_get_html（“”）；
//回显src属性
echo$html->find（'img'，0）->src；

如果您不希望使用regex（或任何非标准PHP组件），使用内置的合理解决方案如下：

<?php
    $doc = new DOMDocument();
    $doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
    $imageTags = $doc->getElementsByTagName('img');

    foreach($imageTags as $tag) {
        echo $tag->getAttribute('src');
    }
?>

以下是我最后做的事情，尽管我不确定这有多高效：

$imgsplit = explode('"',$data);
foreach ($imgsplit as $item) {
    if (strpos($item, 'http') !== FALSE) {
        $image = $item;
        break;
    }
}

您可以使用此函数绕过此问题： function getTextBetween($start, $end, $text) { $start_from = strpos($text, $start); $start_pos = $start_from + strlen($start); $end_pos = strpos($text, $end, $start_pos + 1); $subtext = substr($text, $start_pos, $end_pos); return $subtext; }
$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';
$img_src = getTextBetween('src="', '"', $foo); 函数getTextBetween（$start，$end，$text） { $start\u from=strpos（$text，$start）； $start\u pos=$start\u from+strlen（$start）； $end_pos=strpos（$text，$end，$start_pos+1）； $subtext=substr（$text、$start\u pos、$end\u pos）；返回$subtext； }
$foo=''
$img_src=getTextBetween（'src=“”，“”，$foo）；我得到了以下代码：

$dom = new DOMDocument();
$dom->loadHTML($img);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');

假设只有一个img:p

preg\u match

很好地解决了这个问题

请看我的答案：

我做这件事已经非常晚了，但我有一个简单的解决方案还没有提到。使用

simplexml\u Load\u字符串加载它（如果启用了simplexml），然后通过json\u编码和json\u解码将其翻转过来
$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';

$parsedFoo = json_decode(json_encode(simplexml_load_string($foo)), true);
var_dump($parsedFoo['@attributes']['src']); // output: "http://example.com/img/image.jpg"

我已经用它来解析XML和HTML好几个月了，它工作得很好。我还没有打嗝，虽然我还没有用它解析一个大文件（我想象使用json\u encode
和json\u decode
这样的方法，输入越大，速度越慢）。这很复杂，但它是迄今为止读取HTML属性最简单的方法。
假设我使用
$text ='<img src="blabla.jpg" alt="blabla" />';

代码将返回：
blabla.jpg" alt="blabla" 

这是错误的，我们希望代码返回属性值引号之间的文本，即attr=“value”
所以
真有意思
试一试
将返回：
blabla.jpg

尽管如此，还是要感谢您，因为您的解决方案让我了解了最终的解决方案。
我使用preg\u match\u all捕获HTML文档中的所有图像：
preg_match_all("~<img.*src\s*=\s*[\"']([^\"']+)[\"'][^>]*>~i", $body, $matches);

preg\u match\u all（“~]*>~i”，$body，$matches）；

这一个允许更宽松的声明语法，带有空格和不同的引号类型
在关于使用正则表达式解析HTML的文章中，正则表达式的内容类似于（可能的空格）=（可能的空格）（“或”）（任何非引号符号）（“或”）（>）
。@meagar-在这个有限的范围内，使用正则表达式是有效的（尽管不一定是最有效的途径）。不要使用正则表达式解析HTML。（不是挖苦！）我把原来的帖子标题弄错了，不应该添加正则表达式。我真的很喜欢karim79的解决方案，但它需要添加一个非标准类。如果img大写或标题包含“>”，这将不起作用。使用HTML解析器会更加健壮结果中的实体引用和数字字符引用！如你所愿！=）下面是另一种语法：/src=“（.*）”/i
。HTML允许使用单引号，只要它们匹配。“可选语法”可以匹配比预期多得多的字符。最后，img
属性的开头和结尾可以有空格。它应该是：/[sS][rR][cC]\s*=\s*[']（[^']+）[']/iNice！这与我最后做的非常接近。我不知道DOMDocument，但我会尝试一下。如果图像的URL与文档相关，例如，这种方法会遇到问题。”../../img/something.jpg“上周我确实发现了一个小问题。如果XML节点同时具有属性和值，则此方法只能访问该值。最后，我不得不编写一个简单的解析器，将simplexml转换为数组，同时保留所有数据。我并不想说您的方法不好，但我确实认为使用domdocument将是解决这个问题的更好方法。参考这个例子：domdocumentlibrary太重了，不能用于这么简单的任务。这就像当你有弯刀的选择时，用推土机碾碎一条蛇。
array(1) {
  ["@attributes"]=>
  array(6) {
    ["class"]=>
    string(12) "foo bar test"
    ["title"]=>
    string(10) "test image"
    ["src"]=>
    string(32) "http://example.com/img/image.jpg"
    ["alt"]=>
    string(10) "test image"
    ["width"]=>
    string(3) "100"
    ["height"]=>
    string(3) "100"
  }
}

$text ='<img src="blabla.jpg" alt="blabla" />';

getTextBetween('src="','"',$text);

blabla.jpg" alt="blabla" 

  function getTextBetween($start, $end, $text)
            {
                // explode the start string
                $first_strip= end(explode($start,$text,2));

                // explode the end string
                $final_strip = explode($end,$first_strip)[0];
                return $final_strip;
            }

   getTextBetween('src="','"',$text);

blabla.jpg

preg_match_all("~<img.*src\s*=\s*[\"']([^\"']+)[\"'][^>]*>~i", $body, $matches);