Php 带有修改的HTTP头的文件\u get\u contents（）返回垃圾html输出_Php_Dom_Http Headers_File Get Contents_Garbage

Php 带有修改的HTTP头的文件\u get\u contents（）返回垃圾html输出

php dom

Php 带有修改的HTTP头的文件\u get\u contents（）返回垃圾html输出,php,dom,http-headers,file-get-contents,garbage,Php,Dom,Http Headers,File Get Contents,Garbage,下面的代码用于使用SimpleThmlDom parser for php提取html include('simple_html_dom.php'); $context = stream_context_create(array( 'http'=>array( 'method'=>"GET", 'header'=>"Accept: text/html,application/xhtml+xml,application/xml

下面的代码用于使用SimpleThmlDom parser for php提取html

include('simple_html_dom.php');

$context = stream_context_create(array(
  'http'=>array(
    'method'=>"GET",                
    'header'=>"Accept: text/html,application/xhtml+xml,application/xml\r\n" .
              "Accept-Charset: ISO-8859-1,utf-8\r\n" .
              "Accept-Encoding: gzip,deflate,sdch\r\n" .
              "Accept-Language: en-US,en;q=0.8\r\n",
    'user_agent'=>"User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11\r\n"              
 )
)); 

$html = file_get_contents('http://www.nseindia.com/content/equities/cmbhav.htm', false, $context);
echo $html;

foreach($html->find('a') as $e) 
    echo $e->href . '<br>';

在HTTP请求中，您明确声明您可以处理压缩数据，因此服务器将返回：

Accept-Encoding: gzip,deflate,sdch\r\n

您现在必须解码该压缩数据：

$html = gzuncompress($html);

正如

piotrekkr

在评论中提到的，您也可以删除

Accept Encoding

标题，web服务器应该返回纯文本。

只是一个猜测：垃圾可能是压缩的html。尝试删除HTTP头“接受编码…”。它是纯文本nao吗？它看起来像编码数据。也许您应该删除

“接受编码：gzip、deflate、sdch\r\n”

，这样您就不会得到gzip内容，或者您可以使用zlib函数自己解压缩它。哦，亨克曼更快了：）谢谢你，皮奥特雷克。。汉克斯，你应该把它作为答案！V有用

Accept-Encoding: gzip,deflate,sdch\r\n

$html = gzuncompress($html);