使用php DOMXPath获取选项内容
我正在尝试从所有选项元素或具有以下内容的特定选择标记获取产品尺寸:使用php DOMXPath获取选项内容,php,xpath,domxpath,Php,Xpath,Domxpath,我正在尝试从所有选项元素或具有以下内容的特定选择标记获取产品尺寸: <select id="prodSize" name="prodSize"> <option value="9274">10D</option> <option value="9275">10DD</option> <option value="9276">10E</option> <option value
<select id="prodSize" name="prodSize">
<option value="9274">10D</option>
<option value="9275">10DD</option>
<option value="9276">10E</option>
<option value="9277">10F</option>
<option value="9279">10G</option>
<option value="9288">12D</option>
<option value="9289">12DD</option>
<option value="9290">12E</option>
<option value="9291">12F</option>
<option value="9301">14D</option>
<option value="9302">14DD</option>
<option value="9303">14E</option>
<option value="9304">14F</option>
<option value="9305">14FF</option>
<option value="9315">16D</option>
<option value="9317">16E</option>
<option value="9318">16F</option>
<option value="9319">16FF</option>
<option value="9320">16G</option>
</select>
或:
我得到:
object(DOMNodeList)#40 (1) { ["length"]=> int(0) } object(DOMNodeList)#29 (1) { ["length"]=> int(0) }
object(DOMNodeList)#39 (1) { ["length"]=> int(0) } object(DOMNodeList)#41 (1) { ["length"]=> int(0) }
为了清晰起见,我添加了完整的代码:
scrapCatUrl('http://.../shop-management/categories/maternity-lingerie.aspx', "//ul[@class='lvl2 visible']/li/a/@href");
function scrapCatUrl($path, $query){
$xpath = scrap($path);
$links = $xpath->query($query);
foreach($links as $link){
echo 'Category'.' - '.$url.$link->nodeValue . '<br>';
scrapProdUrl($url.$link->nodeValue);
}
}
function scrapProdUrl($path){
$xpath = scrap($path);
$links = $xpath->query("//a[@class='thumbObj']/@href");
$i = 0;
foreach($links as $link){
echo 'Product'.' - '.$url.$link->nodeValue . '<br>';
getProdData($url.$link->nodeValue);
if($i > 2){
die();
}
$i++;
}
}
function getProdData($path){
$xpath = scrap($path);
$description = $xpath->query("//meta[@name='description']/@content");
$keywords = $xpath->query("//meta[@name='keywords']/@content");
$title = $xpath->query("//h4[@class='h4-productdetail']/text()");
$price = $xpath->query("//div[@class='productDetail']/span[@class='price']/text()");
$images = $xpath->query("//div[@class='imgs']/img/@src");
$fullDescription = $xpath->query("//div[@class='flash']/following-sibling::div[@class='clearer']/preceding-sibling::text()[preceding-sibling::div[@class='flash']]");
$options = $xpath->query("//select[@id='prodSize']/option/text()");
echo 'Meta Description'.' - '.$description->item(0)->nodeValue. '<br>';
echo 'Meta Keywords'.' - '.$keywords->item(0)->nodeValue. '<br>';
echo 'Title'.' - '.$title->item(0)->nodeValue. '<br>';
echo 'Price'.' - '.$price->item(0)->nodeValue. '<br>';
if($images->length > 1){
foreach($images as $image){
echo '<img src="'.$url.$image->nodeValue.'" />'. '<br>';
}
}
else{
echo '<img src="'.$url.$image->nodeValue.'" />'. '<br>';
}
foreach($options as $option){
echo $option->nodeValue;
}
}
function scrap($path){
$ch = curl_init($path);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);
$dom = new DOMDocument();
@$dom->loadHTML($page);
$xpath = new DOMXpath($dom);
return $xpath;
}
我尝试了这里人们建议的几种方法,但得到了相同的结果。我从页面、标题、图像、描述中获取任何其他元素都没有问题,除了这个元素。尝试启用错误报告错误报告全部;ini_设置“显示错误”,“1”;。。函数$path,$query{需要两个参数,但您只提供了一个:$xpath=scrap$path;@kevinabelita已启用错误,未发现问题,仍然获取除大小之外的所有内容:@kevinabelita我认为此大小列表是由javascript生成的。在启动所有javascript函数后是否可以获取元素?查看原始页面源代码。如果可以不包含您想要的选项,这可能是通过javascript填充的。Domxpath似乎无法实现这一点。您可以使用或包装器。
object(DOMNodeList)#40 (1) { ["length"]=> int(0) } object(DOMNodeList)#29 (1) { ["length"]=> int(0) }
object(DOMNodeList)#39 (1) { ["length"]=> int(0) } object(DOMNodeList)#41 (1) { ["length"]=> int(0) }
scrapCatUrl('http://.../shop-management/categories/maternity-lingerie.aspx', "//ul[@class='lvl2 visible']/li/a/@href");
function scrapCatUrl($path, $query){
$xpath = scrap($path);
$links = $xpath->query($query);
foreach($links as $link){
echo 'Category'.' - '.$url.$link->nodeValue . '<br>';
scrapProdUrl($url.$link->nodeValue);
}
}
function scrapProdUrl($path){
$xpath = scrap($path);
$links = $xpath->query("//a[@class='thumbObj']/@href");
$i = 0;
foreach($links as $link){
echo 'Product'.' - '.$url.$link->nodeValue . '<br>';
getProdData($url.$link->nodeValue);
if($i > 2){
die();
}
$i++;
}
}
function getProdData($path){
$xpath = scrap($path);
$description = $xpath->query("//meta[@name='description']/@content");
$keywords = $xpath->query("//meta[@name='keywords']/@content");
$title = $xpath->query("//h4[@class='h4-productdetail']/text()");
$price = $xpath->query("//div[@class='productDetail']/span[@class='price']/text()");
$images = $xpath->query("//div[@class='imgs']/img/@src");
$fullDescription = $xpath->query("//div[@class='flash']/following-sibling::div[@class='clearer']/preceding-sibling::text()[preceding-sibling::div[@class='flash']]");
$options = $xpath->query("//select[@id='prodSize']/option/text()");
echo 'Meta Description'.' - '.$description->item(0)->nodeValue. '<br>';
echo 'Meta Keywords'.' - '.$keywords->item(0)->nodeValue. '<br>';
echo 'Title'.' - '.$title->item(0)->nodeValue. '<br>';
echo 'Price'.' - '.$price->item(0)->nodeValue. '<br>';
if($images->length > 1){
foreach($images as $image){
echo '<img src="'.$url.$image->nodeValue.'" />'. '<br>';
}
}
else{
echo '<img src="'.$url.$image->nodeValue.'" />'. '<br>';
}
foreach($options as $option){
echo $option->nodeValue;
}
}
function scrap($path){
$ch = curl_init($path);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);
$dom = new DOMDocument();
@$dom->loadHTML($page);
$xpath = new DOMXpath($dom);
return $xpath;
}