Php 从网站获取html内容_Php - Fatal编程技术网

Php 从网站获取html内容

php

Php 从网站获取html内容,php,Php,可能重复：我用这段代码从给定的url网站获取html内容 **Code:** ================================================================= example URL: http://www.qatarsale.com/EnMain.aspx /* $regexp = '/<div id="UpdatePanel4">(.*?)<\/div>/i'; @preg_match_all($rege

可能重复：

我用这段代码从给定的url网站获取html内容

**Code:**

=================================================================

example URL: http://www.qatarsale.com/EnMain.aspx

/*

$regexp = '/<div id="UpdatePanel4">(.*?)<\/div>/i';

@preg_match_all($regexp, @file_get_contents('http://www.qatarsale.com/EnMain.aspx'), $matches, PREG_SET_ORDER);*/

/*

**代码：**
=================================================================
示例URL：http://www.qatarsale.com/EnMain.aspx
/*
$regexp='/（.*？）/i'；
@preg\u match\u all（$regexp，@file\u get\u contents（$http://www.qatarsale.com/EnMain.aspx“），$匹配，预设置顺序）*/
/*

但是$matches返回空数组。我想获取在div id=“UpdatePanel4”中找到的所有html内容

如果有人有任何解决办法，请建议我

谢谢

//获取网页
$html=@file\u get\u contents（$html）http://www.qatarsale.com/EnMain.aspx');
$startingTag=''；
//查找“”的位置
$startPos=strpos（$html，$startingTag）；
//获取结束div的位置
$endPos=strpos（$html，，$startPos+strlen（$startingTag））；
//获取起始位置和结束位置之间的内容
$contents=substr（$html，$startPos+strlen（$startingTag），$endPos）；

如果UpdatePanel4 div包含更多的div，您将不得不做更多的工作。首先，请确保服务器允许您获取数据

$html = @file_get_contents('http://www.qatarsale.com/EnMain.aspx');
if (!$html) {
  die('can not get the content!');
}
$doc = new DOMDocument();
$doc->loadHTML($html);
$content = $doc->getElementById('UpdatePanel4');

其次，使用html解析器来解析数据

$html = @file_get_contents('http://www.qatarsale.com/EnMain.aspx');
if (!$html) {
  die('can not get the content!');
}
$doc = new DOMDocument();
$doc->loadHTML($html);
$content = $doc->getElementById('UpdatePanel4');

那没用。即使您设法使Regexp正常工作，使用它的方式也存在两个问题：

如果服务器像这样更改HTML的次要内容会怎么样：
？在这种情况下，您也必须更改Regexp
第二个问题：我想您需要div的
```
innerHTML
```
，对吗？在这种情况下，使用regexp处理的方式是不考虑嵌套或树结构。您将获得的字符串来自您指定的字符串，直到遇到的第一个

解决方案：

使用regexp解析HTML总是一个坏主意。改用一个。
哦，看到有人不建议字符串操作或正则表达式，感觉真好。@AdnanShammout:看到一个20k的代表没有链接到副本，看起来很糟糕。