Php Preg_match的问题_Php_Preg Match

Php Preg_match的问题

php

Php Preg_match的问题,php,preg-match,Php,Preg Match,我制作了一个简单的应用程序，从allrecipes.com.等网站获取配方信息。我正在使用preg\u match，但有些东西不起作用 $geturl = file_get_contents("http://allrecipes.com/Recipe/Brown-Sugar-Smokies/Detail.aspx?src=rotd"); preg_match('#<title>(.*) - Allrecipes.com</title>#', $getu

我制作了一个简单的应用程序，从

allrecipes.com.等网站获取配方信息。

我正在使用

preg\u match

，但有些东西不起作用

$geturl = file_get_contents("http://allrecipes.com/Recipe/Brown-Sugar-Smokies/Detail.aspx?src=rotd");
          preg_match('#<title>(.*) - Allrecipes.com</title>#', $geturl, $match);
          $name = $match[1];
          echo $name;

$geturl=file\u get\u contents（'http://allrecipes.com/Recipe/Brown-Sugar-Smokies/Detail.aspx?src=rotd");
preg#u match（'#（.*）-Allrecipes.com，$geturl，$match）；
$name=$match[1]；
echo$name；

我只是想把页面的标题（减去

-Allrecipes.com

部分）放到一个变量中，但结果都是空的

这种模式有两个问题。首先，在

后面有一个换行符，它不是由

捕获的（因为没有

/s

修饰符

字面上是“除下线符号外的任何符号”）。其次，

Allrecipes.com

文本后面实际上没有

子字符串，它们之间有一个新行分隔

考虑到

\s

同时包含普通空格和行分隔空格，您可以这样修改正则表达式：

'#<title>\s*(.*?) - Allrecipes.com\s*</title>#s'

“#\s*（.*）-Allrecipes.com\s*#s”

/s

修饰符实际上与此无关（请注意，请向minitech咨询），因为此配方中的标题是一行，所有“\n”符号都将包含在

\s*

子表达式中。但我还是建议把它放在那里，这样多行标题就不会让你措手不及

为了提高效率，我将

替换为

*？

：因为您要查找的字符串很短，所以在这里使用非贪婪量词是有意义的。

此模式中存在两个问题。首先，在

后面有一个换行符，它不是由

捕获的（因为没有

/s

修饰符

字面上是“除下线符号外的任何符号”）。其次，

Allrecipes.com

文本后面实际上没有

子字符串，它们之间有一个新行分隔

考虑到

\s

同时包含普通空格和行分隔空格，您可以这样修改正则表达式：

'#<title>\s*(.*?) - Allrecipes.com\s*</title>#s'

“#\s*（.*）-Allrecipes.com\s*#s”

/s

修饰符实际上与此无关（请注意，请向minitech咨询），因为此配方中的标题是一行，所有“\n”符号都将包含在

\s*

子表达式中。但我还是建议把它放在那里，这样多行标题就不会让你措手不及

为了提高效率，我在这里将

替换为

*？

：因为您要查找的字符串很短，所以在这里使用非贪婪量词是有意义的。

如果您查看页面的源代码，您会注意到

在实际文本周围包含一些填充，您需要对此进行补偿

'#<title>\s*(.*) - Allrecipes.com\s*</title>#'

“#\s*（.*）-Allrecipes.com\s*#”

如果查看页面的源代码，您会注意到

在实际文本周围包含一些填充，您需要对此进行补偿

'#<title>\s*(.*) - Allrecipes.com\s*</title>#'

“#\s*（.*）-Allrecipes.com\s*#”

您应该首先获取整个标题，然后使用PHP将其剥离，如下所示：

<?php

$raw_html=file_get_contents('http://www.allrecipes.com');
if (empty($raw_html)) {
    throw new \RuntimeException('Fetch empty');
}

$matches=array();
if (preg_match('/<title>(.*)<\/title>/s', $raw_html, $matches) === false) {
    throw new \RuntimeException('Regex error');
}

$title=trim($matches[1]);

// you should strip your title here
echo $title;

您应该首先获取整个标题，然后使用PHP将其剥离，如下所示：
<?php

$raw_html=file_get_contents('http://www.allrecipes.com');
if (empty($raw_html)) {
    throw new \RuntimeException('Fetch empty');
}

$matches=array();
if (preg_match('/<title>(.*)<\/title>/s', $raw_html, $matches) === false) {
    throw new \RuntimeException('Regex error');
}

$title=trim($matches[1]);

// you should strip your title here
echo $title;

您是否尝试过打印$geturl
并查看字符串是否确实在那里？您缺少对代码进行故障排除的功能。例如，在使用返回值之前检查返回值。@minitech，是的，当我打印$geturl
时，我得到了整个Allrecipes.com页面。您是否尝试打印$geturl
并查看字符串是否在那里？您缺少代码故障排除功能。例如，在使用返回值之前检查它们。@minitech，是的，当我打印$geturl
时，我得到整个Allrecipes.com页面
也不会立即跟随Allrecipes.com
。但是为什么s
修饰符现在是相关的呢？配方名称中没有换行符。当然，你是对的；我认为常见的“。不是一个\n”
问题是这里唯一的问题，而事实上OP应该在
之前以及在正则表达式中覆盖填充。
也不会立即跟随Allrecipes.com
。但是为什么s
修饰符现在相关呢？配方名称中没有换行符。当然，你是对的；我认为常见的“。不是一个\n”
问题是这里唯一的问题，而实际上OP应该在正则表达式中覆盖
之前的填充。