Php 获取html中的内容不起作用_Php_File Get Contents

Php 获取html中的内容不起作用

php

Php 获取html中的内容不起作用,php,file-get-contents,Php,File Get Contents,我试图从网站内部提取html内容。我只想要标签内的内容 //$validLink is a link with .htm extension, source code is rather large //contains 24,000 lines of html code $thehtml = file_get_contents($validlink); $thehtml = preg_match("/<body.*?>(.*?)<\/bod

我试图从网站内部提取html内容。我只想要标签内的内容

    //$validLink is a link with .htm extension, source code is rather large 
    //contains 24,000 lines of html code

    $thehtml = file_get_contents($validlink);
    $thehtml = preg_match("/<body.*?>(.*?)<\/body>/is", $thehtml);

/$validLink是一个扩展名为.htm的链接，源代码相当大
//包含24000行html代码
$thehtml=file\u get\u contents（$validlink）；
$thehtml=preg_match（“/（.*？）/is”，$thehtml）；

我还能做什么$HTML是空的。。。。我正试图将此插入wordpress帖子。。。但是$thehtml是空的。。。。出于某种奇怪的原因。是否存在可能的超时问题或其他问题

不可能存在超时问题。。。。。因为我注意到，如果我只输出file_get_内容（$validlink）；由于某种原因，找不到尸体

另一种可能的解决方案是只获取文档中第一个div和最后一个div之间的内容。…

使用标记开始和结束的'strpos（）'获取字符串位置，然后使用子字符串方法，即substr（），使用此位置

$thehtml=file\u get\u contents（$validlink）；
$thehtml = file_get_contents($validlink);
$thehtml = preg_match("/<body.*?>(.*?)<\/body>/is", $thehtml,$matches);
$thehtml = $matches[0];

$thehtml=preg_match（“/（.*？）/is“，$thehtml$matches）；
$thehtml=$matches[0]；

以下是正确的代码：

$thehtml = file_get_contents($validlink);
preg_match('/<body.*?>(.*?)<\/body>/is', $thehtml, $matches);
$thehtml = $matches[1];

$thehtml=file\u get\u contents（$validlink）；
preg_match（'/（.*？）/is'，$thehtml，$matches）；
$thehtml=$matches[1]；

但是我建议您改为使用。

使用DOM解析器，而不是regexp，从HTML中提取信息。如何使用DOM解析器$html=文件获取内容（$validlink）$dumphtml=$thehtml->find（'body'）->innertext？？？谢谢，我可以用substr（）和strpos（）等来表达我的想法。