Php 正在尝试将DOMDocument::loadHTMLFile与生成的url一起使用
我正在使用我构建的url调用DOMDocument::loadHTMLFile方法 这是我用来构建url的代码:Php 正在尝试将DOMDocument::loadHTMLFile与生成的url一起使用,php,domdocument,Php,Domdocument,我正在使用我构建的url调用DOMDocument::loadHTMLFile方法 这是我用来构建url的代码: $url = "http://en.wikipedia.org".$path $path是从另一个文件的href属性获得的。当我回显时,它返回/wiki/Pop_music 如果我将url硬编码为http://en.wikipedia.org/wiki/Pop_music页面返回正常,但如果我尝试使用生成的路径,就会出现错误 这是我目前正在使用的代码: foreach ($
$url = "http://en.wikipedia.org".$path
$path
是从另一个文件的href属性获得的。当我回显时,它返回/wiki/Pop_music
如果我将url硬编码为http://en.wikipedia.org/wiki/Pop_music
页面返回正常,但如果我尝试使用生成的路径,就会出现错误
这是我目前正在使用的代码:
foreach ($paths as $path)
{
echo $path; // will cause error
//echo $path = '/wiki/Pop_music'; // will work
$url = "http://en.wikipedia.org"."$path";
$doc = getHTML($url, 1);
if($doc !== false)
{
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
}
set_error_handler(function($errno, $errstr, $errfile, $errline) {
//Digest error here
});
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
restore_error_handler();
$doc = getHTML($url, 1);
if ($doc instanceof \DOMDocument) {
$xpath = new DOMXPath($doc);
}
foreach($path作为$path)
{
echo$path;//将导致错误
//echo$path='/wiki/Pop_music';//将起作用
$url=”http://en.wikipedia.org“$path”;
$doc=getHTML($url,1);
如果($doc!==false)
{
$xpath=新的DOMXPath($doc);
$xpathCode=“//h1[@id='firstHeading']”;
$nodes=$xpath->query($xpathCode);
echo$nodes->item(0)->nodeValue。“
”;
}
}
getHTML函数是:
function getHTML($url, $domainID)
{
$conArtistsCrawler = new mysqli(HOST, USERNAME, PASSWORD, CRAWLER_DB_NAME);
// Load HTML
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
// Update the time to show that the domain was crawled.
$sql = "UPDATE Domain SET LastCrawled = CURRENT_TIMESTAMP() WHERE DomainID = '$domainID'";
$conArtistsCrawler->query($sql);
$conArtistsCrawler->close();
// Delay 1 second after the request to avoid getting BANNED
sleep(1);
// Check to see if URL is valid
if($isSuccessful === false)
{
//URL invalid!
echo "\"".$url."\" is invalid<br>";
return false;
}
return $doc;
}
函数getHTML($url,$domainID)
{
$conArtistsCrawler=新的mysqli(主机、用户名、密码、爬虫程序\u DB\u名称);
//加载HTML
$doc=新的DOMDocument();
$isSuccessful=$doc->loadHTMLFile($url);
//更新时间以显示域已爬网。
$sql=“UPDATE Domain SET lastclawled=CURRENT_TIMESTAMP(),其中DomainID='$DomainID';
$conArtistsCrawler->query($sql);
$conartist scrawler->close();
//请求后延迟1秒以避免被禁止
睡眠(1);
//检查URL是否有效
如果($isSuccessful==false)
{
//URL无效!
回送“\”.$url.“\”无效”; 返回false; } 返回$doc; } 代码输出:
foreach ($paths as $path)
{
echo $path; // will cause error
//echo $path = '/wiki/Pop_music'; // will work
$url = "http://en.wikipedia.org"."$path";
$doc = getHTML($url, 1);
if($doc !== false)
{
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
}
set_error_handler(function($errno, $errstr, $errfile, $errline) {
//Digest error here
});
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
restore_error_handler();
$doc = getHTML($url, 1);
if ($doc instanceof \DOMDocument) {
$xpath = new DOMXPath($doc);
}
带有硬编码路径:
foreach ($paths as $path)
{
echo $path; // will cause error
//echo $path = '/wiki/Pop_music'; // will work
$url = "http://en.wikipedia.org"."$path";
$doc = getHTML($url, 1);
if($doc !== false)
{
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
}
set_error_handler(function($errno, $errstr, $errfile, $errline) {
//Digest error here
});
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
restore_error_handler();
$doc = getHTML($url, 1);
if ($doc instanceof \DOMDocument) {
$xpath = new DOMXPath($doc);
}
警告:DOMDocument::loadHTMLFile():ID已受保护的图标
定义于,第行:60 in
/第77行的Applications/MAMP/htdocs/Assignments/Assignment4/test.php
/wiki/Pop_音乐警告:DOMDocument::loadHTMLFile():标记音频
输入无效,第225行输入
/第77行的Applications/MAMP/htdocs/Assignments/Assignment4/test.php
警告:DOMDocument::loadHTMLFile():中的标记源无效
,第行:225英寸
/第77行的Applications/MAMP/htdocs/Assignments/Assignment4/test.php
流行音乐
带有路径变量:
foreach ($paths as $path)
{
echo $path; // will cause error
//echo $path = '/wiki/Pop_music'; // will work
$url = "http://en.wikipedia.org"."$path";
$doc = getHTML($url, 1);
if($doc !== false)
{
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
}
set_error_handler(function($errno, $errstr, $errfile, $errline) {
//Digest error here
});
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
restore_error_handler();
$doc = getHTML($url, 1);
if ($doc instanceof \DOMDocument) {
$xpath = new DOMXPath($doc);
}
警告:DOMDocument::loadHTMLFile():中已定义ID保护图标
,行:60英寸
/第77行的Applications/MAMP/htdocs/Assignments/Assignment4/test.php
/维基/流行音乐
警告:
DOMDocument::loadHTMLFile():
无法打开流:HTTP请求失败!HTTP/1.1400错误请求
在线输入/Applications/MAMP/htdocs/Assignments/Assignment4/test.php
77
警告:DOMDocument::loadHTMLFile():I/O警告:加载失败
外部实体“”
在线输入/Applications/MAMP/htdocs/Assignments/Assignment4/test.php
“77”无效
简短答复:
嗯,您得到的错误是由于$doc
不是DOMDocument对象,而是布尔值false。由于您正在抑制DOMDocument警告,因此无法知道为什么getHTML()
返回false
所以,失去@operator,检查DOMDocument抱怨什么
然后从那里进行调试。
编辑:
foreach ($paths as $path)
{
echo $path; // will cause error
//echo $path = '/wiki/Pop_music'; // will work
$url = "http://en.wikipedia.org"."$path";
$doc = getHTML($url, 1);
if($doc !== false)
{
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
}
set_error_handler(function($errno, $errstr, $errfile, $errline) {
//Digest error here
});
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
restore_error_handler();
$doc = getHTML($url, 1);
if ($doc instanceof \DOMDocument) {
$xpath = new DOMXPath($doc);
}
但我仍然不确定,为什么当我传入变量时,会得到一个
当我硬编码时,结果就不同了。当我回显两个路径值时
或者url值,它们看起来完全相同
它们当然不完全相同。流行音乐后有一个
标记,这使得url无效。
长话短说
运行此脚本:
$path = '/wiki/Pop_music';
$url = "http://en.wikipedia.org$path";
$doc = new \DOMDocument();
$success = @$doc->loadHTMLFile($url);
if ($success) {
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
注意:您也可以使用
getHTML函数返回的类型不一致
getHTML函数可以返回DOMDocument对象或布尔值。虽然这本身并不是一件坏事(在内部,PHP通过许多函数实现了这一点),但这意味着您不能假设$doc
是一个对象,因为它可以是布尔值false。因此,在将返回值作为参数传递给XDOMPath之前,必须对其进行测试。事实上,这就是你所得到的错误:
您正在将布尔值传递给XDOMPath,而不是DOMDocument对象
结伴
要么在函数中抛出异常(或错误),要么在传递给XDOMPath之前测试返回值
示例:
foreach ($paths as $path)
{
echo $path; // will cause error
//echo $path = '/wiki/Pop_music'; // will work
$url = "http://en.wikipedia.org"."$path";
$doc = getHTML($url, 1);
if($doc !== false)
{
$xpath = new DOMXPath($doc);
$xpathCode = "//h1[@id='firstHeading']";
$nodes = $xpath->query($xpathCode);
echo $nodes->item(0)->nodeValue."<br />";
}
}
set_error_handler(function($errno, $errstr, $errfile, $errline) {
//Digest error here
});
$doc = new DOMDocument();
$isSuccessful = $doc->loadHTMLFile($url);
restore_error_handler();
$doc = getHTML($url, 1);
if ($doc instanceof \DOMDocument) {
$xpath = new DOMXPath($doc);
}
您应该在
@$doc->loadHTMLFile($url)之前删除@
并使用正确的ErrorHandler我取出@
并得到一些错误消息,但我仍然不确定为什么传入变量时得到的结果与硬编码时不同。当我回显路径值或url值时,它们看起来是相同的。弹出音乐后有一个
标记,以查看详细的响应。我完全忘记了我以前添加的@
。我删除了@
,但是我传入的变量仍然有问题。嘿,谢谢!我刚修好!我在path变量中附加了一个换行标记。@Tony yeah=)一个小错误,导致调试困难。这就是为什么@operator是邪恶的=P