Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/86.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 从页面中提取数据_Php_Html_File Get Contents_Html Content Extraction - Fatal编程技术网

Php 从页面中提取数据

Php 从页面中提取数据,php,html,file-get-contents,html-content-extraction,Php,Html,File Get Contents,Html Content Extraction,您好,我想创建一个html和php页面,该页面能够获取此链接中包含的表中的数据: 我想知道一些提示,我会使用文件内容,但我不知道如何获取所有的各种数据 这些问题也可能对您有所帮助: 您能更清楚地向我们解释一下您想从本页中获得什么吗 无论如何,要做到这一点,您可以使用file\u get\u contents来获取页面,然后根据您想要从页面获取的内容(我假设您想要从表中的页面获取每个元素),您可以使用来获取所需的所有数据 您的案例示例: $page = file_get_contents("ht

您好,我想创建一个html和php页面,该页面能够获取此链接中包含的表中的数据:


我想知道一些提示,我会使用文件内容,但我不知道如何获取所有的各种数据

这些问题也可能对您有所帮助:


您能更清楚地向我们解释一下您想从本页中获得什么吗

无论如何,要做到这一点,您可以使用file\u get\u contents来获取页面,然后根据您想要从页面获取的内容(我假设您想要从表中的页面获取每个
元素),您可以使用来获取所需的所有数据

您的案例示例:

$page = file_get_contents("http://www.comuni-italiani.it/province.html");

$output = array();
preg_match_all('/<td.*.<\/td>/',$page,$output);

print_r($output);
当然,这是可以过滤的

例如,在您的示例中,通过添加一个小的foreach循环…:

$page = file_get_contents("http://www.comuni-italiani.it/province.html");

    $output = array();
    preg_match_all('/<td.*.<\/td>/',$page,$output);

    $provinces = array();

    foreach ($output as $id => $list) {
        for ($i = 2; $i <= 111; $i++) {
            array_push($provinces,$list[$i]);
        }
    }

    print_r($provinces);
(为巨大的阵列感到抱歉)

但是,它将链接保留在数组中,因此,如果您只想获取值而不想获取与之关联的锚点,可以随意使用另一个正则表达式

希望这有帮助


(以此为例,请记住,如果页面发生更改,此foreach技巧可能不再有效,我发布它只是为了让您了解如何解决该问题)。

非常感谢,但如果我想添加“sigla”专栏
$page = file_get_contents("http://www.comuni-italiani.it/province.html");

    $output = array();
    preg_match_all('/<td.*.<\/td>/',$page,$output);

    $provinces = array();

    foreach ($output as $id => $list) {
        for ($i = 2; $i <= 111; $i++) {
            array_push($provinces,$list[$i]);
        }
    }

    print_r($provinces);
Array ( [0] => Agrigento [1] => Alessandria [2] => Ancona [3] => Aosta [4] => Arezzo [5] => Ascoli Piceno [6] => Asti [7] => Avellino [8] => Bari [9] => Barletta-Andria-Trani [10] => Belluno [11] => Benevento [12] => Bergamo [13] => Biella [14] => Bologna [15] => Bolzano [16] => Brescia [17] => Brindisi [18] => Cagliari [19] => Caltanissetta [20] => Campobasso [21] => Carbonia-Iglesias [22] => Caserta [23] => Catania [24] => Catanzaro [25] => Chieti [26] => Como [27] => Cosenza [28] => Cremona [29] => Crotone [30] => Cuneo [31] => Enna [32] => Fermo [33] => Ferrara [34] => Firenze [35] => Foggia [36] => Forlì-Cesena [37] => Frosinone [38] => Genova [39] => Gorizia [40] => Grosseto [41] => Imperia [42] => Isernia [43] => La Spezia [44] => L'Aquila [45] => Latina [46] => Lecce [47] => Lecco [48] => Livorno [49] => Lodi [50] => Lucca [51] => Macerata [52] => Mantova [53] => Massa-Carrara [54] => Matera [55] => Messina [56] => Milano [57] => Modena [58] => Monza e della Brianza [59] => Napoli [60] => Novara [61] => Nuoro [62] => Olbia-Tempio [63] => Oristano [64] => Padova [65] => Palermo [66] => Parma [67] => Pavia [68] => Perugia [69] => Pesaro e Urbino [70] => Pescara [71] => Piacenza [72] => Pisa [73] => Pistoia [74] => Pordenone [75] => Potenza [76] => Prato [77] => Ragusa [78] => Ravenna [79] => Reggio Calabria [80] => Reggio Emilia [81] => Rieti [82] => Rimini [83] => Roma [84] => Rovigo [85] => Salerno [86] => Medio Campidano [87] => Sassari [88] => Savona [89] => Siena [90] => Siracusa [91] => Sondrio [92] => Taranto [93] => Teramo [94] => Terni [95] => Torino [96] => Ogliastra [97] => Trapani [98] => Trento [99] => Treviso [100] => Trieste [101] => Udine [102] => Varese [103] => Venezia [104] => Verbano-Cusio-Ossola [105] => Vercelli [106] => Verona [107] => Vibo Valentia [108] => Vicenza [109] => Viterbo )