PHP预匹配错误，提取时带有空格_Php_Regex_Expression

PHP预匹配错误，提取时带有空格

php regex

PHP预匹配错误，提取时带有空格,php,regex,expression,Php,Regex,Expression,我想从web源中提取数据，但preg match中出现错误 <?php $html=file_get_contents("https://www.instagram.com/p/BJz4_yijmdJ/?taken-by=the.witty"); preg_match("("instapp:owner_user_id" content="(.*)")", $html, $match); $title = $match[1]; echo $title; ?> 这就是我

我想从web源中提取数据，但preg match中出现错误

    <?php

$html=file_get_contents("https://www.instagram.com/p/BJz4_yijmdJ/?taken-by=the.witty");
preg_match("("instapp:owner_user_id" content="(.*)")", $html, $match);
$title = $match[1];

echo $title;
?>

这就是我得到的错误

分析错误：语法错误，中出现意外的“instapp”（T_字符串） /第4行的home/ubuntu/workspace/test.php

请帮帮我怎么做？我还想用regex从页面中提取更多数据，那么是否可以使用单个代码一次提取所有数据？或者我想多次使用pregmatch？

主要问题是您没有形成有效的字符串文字。请注意，PHP同时支持单引号和双引号字符串文字，您可以利用这一点：

preg_match('~"instapp:owner_user_id" content="([^"]*)"~', $html, $match);

虽然可以使用成对的

（…）

符号作为正则表达式分隔符，但我建议使用更传统的

或

符号

另外，

（.*）

是一种过于通用的模式，可能会超出您的需要，因为

也会匹配

“

”，

是一个贪婪的修饰符，求反字符类更好，

（[^”]*）

-0+字符，而不是

”

但是要在PHP中解析HTML，可以使用DOM解析器，如

下面是一个示例，用于获取具有

内容

属性的所有

元

标记，并提取该属性的值并保存在数组中：

$html = "<html><head><meta property=\"al:ios:url\" content=\"instagram://media?id=1329656989202933577\" /></head><body><span/></body></html>";
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$metas = $xpath->query('//meta[@content]');
$res = array();
foreach($metas as $m) { 
   array_push($res, $m->getAttribute('content'));
}
print_r($res);

请参见什么是instapp:owner\u user\u id？它是文字吗？

preg\u match（“~”instapp:owner\u user\u id“content=“（[^”]*）”~，$html，$match）非常感谢。它在@WiktorStribiżewi工作。我也想提取ID。是否可以在提取用户ID的同时提取此ID@WiktorStribiżewThanks它奏效了。实际上我是一个彻头彻尾的傻瓜。我刚开始学习PHP，想做这件事，因为我已经为此做了很多工作。我想从同一个页面提取更多数据。你能帮我提取这两个数据吗？从页面中，请参见。事实上，我不知道你到底需要什么，但这已经与这个问题无关。我很难理解基于DOM的。你能做一个像上面那样简单的吗？像这样的preg_match（“~”instapp:owner_user_id”content=“（[^“]*）”，$html，$match）；只需做一行简单的代码，从该页面中提取1329656989202933577[值是动态的]查看源代码：好吧，试试如果（preg_match（“~”instapp:owner_user_id”content=“[^\d”]*（\d+），$html，$match））{$my_required.$match[1]}您仍然无法获取它，只需提取132965698920292933577。制作一个正则表达式，从上述代码中提取1329656989202933577
$xpath = new DOMXPath($dom);
$metas = $xpath->query('//meta[@property="al:ios:url"]');
$id = "";
if (preg_match('~[?&]id=(\d+)~', $metas->item(0)->getAttribute('content'), $match))
{
    $id = $match[1];
}