Php 从页面中提取数据
您好,我想创建一个html和php页面,该页面能够获取此链接中包含的表中的数据:Php 从页面中提取数据,php,html,file-get-contents,html-content-extraction,Php,Html,File Get Contents,Html Content Extraction,您好,我想创建一个html和php页面,该页面能够获取此链接中包含的表中的数据: 我想知道一些提示,我会使用文件内容,但我不知道如何获取所有的各种数据 这些问题也可能对您有所帮助: 您能更清楚地向我们解释一下您想从本页中获得什么吗 无论如何,要做到这一点,您可以使用file\u get\u contents来获取页面,然后根据您想要从页面获取的内容(我假设您想要从表中的页面获取每个元素),您可以使用来获取所需的所有数据 您的案例示例: $page = file_get_contents("ht
我想知道一些提示,我会使用文件内容,但我不知道如何获取所有的各种数据 这些问题也可能对您有所帮助:
您能更清楚地向我们解释一下您想从本页中获得什么吗 无论如何,要做到这一点,您可以使用file\u get\u contents来获取页面,然后根据您想要从页面获取的内容(我假设您想要从表中的页面获取每个
元素),您可以使用来获取所需的所有数据
您的案例示例:
$page = file_get_contents("http://www.comuni-italiani.it/province.html");
$output = array();
preg_match_all('/<td.*.<\/td>/',$page,$output);
print_r($output);
当然,这是可以过滤的
例如,在您的示例中,通过添加一个小的foreach循环…:
$page = file_get_contents("http://www.comuni-italiani.it/province.html");
$output = array();
preg_match_all('/<td.*.<\/td>/',$page,$output);
$provinces = array();
foreach ($output as $id => $list) {
for ($i = 2; $i <= 111; $i++) {
array_push($provinces,$list[$i]);
}
}
print_r($provinces);
(为巨大的阵列感到抱歉)
但是,它将链接保留在数组中,因此,如果您只想获取值而不想获取与之关联的锚点,可以随意使用另一个正则表达式
希望这有帮助
(以此为例,请记住,如果页面发生更改,此foreach技巧可能不再有效,我发布它只是为了让您了解如何解决该问题)。非常感谢,但如果我想添加“sigla”专栏
$page = file_get_contents("http://www.comuni-italiani.it/province.html");
$output = array();
preg_match_all('/<td.*.<\/td>/',$page,$output);
$provinces = array();
foreach ($output as $id => $list) {
for ($i = 2; $i <= 111; $i++) {
array_push($provinces,$list[$i]);
}
}
print_r($provinces);
Array ( [0] => Agrigento [1] => Alessandria [2] => Ancona [3] => Aosta [4] => Arezzo [5] => Ascoli Piceno [6] => Asti [7] => Avellino [8] => Bari [9] => Barletta-Andria-Trani [10] => Belluno [11] => Benevento [12] => Bergamo [13] => Biella [14] => Bologna [15] => Bolzano [16] => Brescia [17] => Brindisi [18] => Cagliari [19] => Caltanissetta [20] => Campobasso [21] => Carbonia-Iglesias [22] => Caserta [23] => Catania [24] => Catanzaro [25] => Chieti [26] => Como [27] => Cosenza [28] => Cremona [29] => Crotone [30] => Cuneo [31] => Enna [32] => Fermo [33] => Ferrara [34] => Firenze [35] => Foggia [36] => Forlì-Cesena [37] => Frosinone [38] => Genova [39] => Gorizia [40] => Grosseto [41] => Imperia [42] => Isernia [43] => La Spezia [44] => L'Aquila [45] => Latina [46] => Lecce [47] => Lecco [48] => Livorno [49] => Lodi [50] => Lucca [51] => Macerata [52] => Mantova [53] => Massa-Carrara [54] => Matera [55] => Messina [56] => Milano [57] => Modena [58] => Monza e della Brianza [59] => Napoli [60] => Novara [61] => Nuoro [62] => Olbia-Tempio [63] => Oristano [64] => Padova [65] => Palermo [66] => Parma [67] => Pavia [68] => Perugia [69] => Pesaro e Urbino [70] => Pescara [71] => Piacenza [72] => Pisa [73] => Pistoia [74] => Pordenone [75] => Potenza [76] => Prato [77] => Ragusa [78] => Ravenna [79] => Reggio Calabria [80] => Reggio Emilia [81] => Rieti [82] => Rimini [83] => Roma [84] => Rovigo [85] => Salerno [86] => Medio Campidano [87] => Sassari [88] => Savona [89] => Siena [90] => Siracusa [91] => Sondrio [92] => Taranto [93] => Teramo [94] => Terni [95] => Torino [96] => Ogliastra [97] => Trapani [98] => Trento [99] => Treviso [100] => Trieste [101] => Udine [102] => Varese [103] => Venezia [104] => Verbano-Cusio-Ossola [105] => Vercelli [106] => Verona [107] => Vibo Valentia [108] => Vicenza [109] => Viterbo )