Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/263.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/typescript/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PHP文档从HTML中隔离数据_Php_Html_Parsing_Domdocument - Fatal编程技术网

PHP文档从HTML中隔离数据

PHP文档从HTML中隔离数据,php,html,parsing,domdocument,Php,Html,Parsing,Domdocument,我尝试了几种方法,但没有太多成功,因此我的html如下所示: <td> <a href="..?ID=343"> <img src=".." /> </a> </td> <td> <a href="..?id-343"> < - diffirence between two links is that this one has id in lowercase Some text..

我尝试了几种方法,但没有太多成功,因此我的html如下所示:

<td>
  <a href="..?ID=343">
    <img src=".." />
  </a>
</td>
<td>
 <a href="..?id-343">  < - diffirence between two links is that this one has id in lowercase
  Some text..
 </a>
<td>
现在我想得到这个元素和这个内容: 一些文字

我成功地获得了这两个信息,但由于某种原因,如果我打印links\u数组,我会得到双链接:

数组[0]=> [1] => [2] => [3] => [4]


我试着比较nodeValue是否正确,但没有成功。提前感谢您的帮助。

也许链接已经在那里两次了


$dom->getElementsByTagName'a'是一个全局搜索。

$links是一个DOMDocument节点数组,因为它获取内容中的所有a-s,但在for$j循环中,我将找到的内容添加到数组中,出于某种原因,它仍然添加了两次。$links->length的值是多少?你似乎期望2个,但可能是4个?它的回报是数量的两倍。这意味着它同时存储了该sorround元素和第二个a。我期望1,但给出2。问题是$links会同时捕获这两个元素,但是在我的$links\u数组中,我只想存储1,它不会对Yes进行排序。。困难的这就是为什么刮痧如此让人麻木。难道没有更好的方法来获取您想要的数据吗?比如一个合适的Web服务?
$ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, "http://www.....net/2004/dealer_Zaloga.asp?dealer=12321");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $output = curl_exec($ch);

    $dom = new DOMDocument;
    @$dom->loadHTML($output);




    // Get images
    $images = $dom->getElementsByTagName('img');
    $image_array = array();

    for($i = 0; $i < $images->length; $i++) {
        if($images->item($i)->getAttribute('width') == "80") {
            array_push($image_array, $dom->saveHTML($images->item($i)));
        }
    }

    // Get links
    $links = $dom->getElementsByTagName('a');
    $links_array = array();
    $title_array = array();

   //Here i try to compare the two a that it finds i want to store only the one that does not have img element right after it but for some reason it stores both.

    // All arrays are the same size img, links title
    for($j = 0; $j < $links->length; $j++) {
        if(isset($image_array[$j]) && $dom->saveHTML($links->item($j+1)) != $image_array[$j]) {
            array_push($links_array, 'http://www.....net/2004/' . $links->item($j)->getAttribute('href'));
            array_push($title_array, $links->item($j)->nodeValue);
        }
    }