使用PHP和XPath解析ASP网页中的值

使用PHP和XPath解析ASP网页中的值,php,asp.net,parsing,curl,xpath,Php,Asp.net,Parsing,Curl,Xpath,我正在努力刮这个网页 使用PHP和XPath获取红色、黄色、绿色和白色圆圈下的数值 注意:如果您尝试浏览该页面,您可能会在该页面中看到不同的值。。。没关系,它会自动改变 我正在尝试使用这个PHP代码示例来打印值 <?php ini_set('display_errors', 'On'); error_reporting(E_ALL); $url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

我正在努力刮这个网页

使用PHP和XPath获取红色、黄色、绿色和白色圆圈下的数值

注意:如果您尝试浏览该页面,您可能会在该页面中看到不同的值。。。没关系,它会自动改变

我正在尝试使用这个PHP代码示例来打印值

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

    $xpath_for_parsing = '[@id="prontosoccorso"]/tbody/tr[2]/td[2]';

    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');

    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>
结果是

2017年12月30日《阿勒奥雷》14:09罗索·维德·阿祖罗·比安科·帕齐安蒂在阿泰萨总计0 0 0 0 0 0 0 0 0 0 0帕齐安蒂在visita总计0 0 0 0 0 0 0 0 0 0 0 0 0 0帕齐安蒂在内尔最终矿石交易0 0 0 0 0 0 0 0 0 0 0 0 0

所以我的值的结果0是一致的,如果你尝试下面的旋度http://prontosoccorso.usl4.toscana.it/attesa/home.aspfrom 命令行您注意到这些值都是零

通过浏览器控制台分析,我找不到获取实际值的请求。。。。。有什么帮助/建议吗


提前谢谢你

需要注意的一点是,即使您访问该网页,您也会从所有字段的0开始,这就是为什么我尝试两次加载该网页的原因。这仍然不起作用,所以我让它在调用之间存储cookies,值开始出现

代码主要是你所拥有的,有额外的curl_setopt调用来创建一个cookie文件,也许可以这样做一次,这将永远有效-不要引用我的话

XPath将只获取第一行字段,但这可以很容易地适应其他行

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);

$url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$cookies = "./cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$xpath_for_parsing = '//table[@id="prontosoccorso"]/tbody/tr[2]/td';

$colorWaitingNumber = $xpath->query($xpath_for_parsing);

$theValue =  'N.D.';
foreach( $colorWaitingNumber as $node )
{
    echo $theValue = $node->nodeValue.PHP_EOL;
}
您可以添加一些逻辑来检查是否所有值都为0以重新加载页面。但是这段代码只调用了curl_exec两次

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);

$url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$cookies = "./cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$xpath_for_parsing = '//table[@id="prontosoccorso"]/tbody/tr[2]/td';

$colorWaitingNumber = $xpath->query($xpath_for_parsing);

$theValue =  'N.D.';
foreach( $colorWaitingNumber as $node )
{
    echo $theValue = $node->nodeValue.PHP_EOL;
}