Php 获取<;脚本>;使用DOM解析器的头部HTML
我目前正在为我的项目使用DOM解析器。另外,我正在使用php中的CURL来抓取网站。我想从HTML头部的script标记中获取一个值。但我真的不知道该怎么做。如果运行下面的代码:Php 获取<;脚本>;使用DOM解析器的头部HTML,php,html,domparser,Php,Html,Domparser,我目前正在为我的项目使用DOM解析器。另外,我正在使用php中的CURL来抓取网站。我想从HTML头部的script标记中获取一个值。但我真的不知道该怎么做。如果运行下面的代码: $data_dom = new simple_html_dom(); $data_dom->load($html); foreach($data_dom->find('script') as $script){ echo $script->plaintext."<br>"; }
$data_dom = new simple_html_dom();
$data_dom->load($html);
foreach($data_dom->find('script') as $script){
echo $script->plaintext."<br>";
}
$data\u dom=new simple\u html\u dom();
$data\u dom->load($html);
foreach($data\u dom->find('script')作为$script){
echo$script->纯文本。“
”;
}
结果是空值,当我检查它时,只显示br标记。我想用script标签得到所有的东西。以下是人头价值:
<head>
I will give you the script I want to get
.....
<script type="text/javascript">
var keysearch = {"departureLabel":"Surabaya (SUB : Juanda) Jawa Timur Indonesia","arrivalLabel":"Palangkaraya (PKY : Tjilik Riwut | Panarung) Kalimantan Tengah Indonesia","adultNum":"1","childNum":"0","infantNum":"0","departure":"SUB","arrival":"PKY","departDate":"20181115","roundTrip":0,"cabinType":-1,"departureCode":"ID-Surabaya-SUB","arrivalCode":"ID-Palangkaraya-PKY"};
(function(window, _gtm, keysearch){
if (window.gtmInstance){
var departureExp = keysearch.departureCode.split("-");
var arrivalExp = keysearch.arrivalCode.split("-");
gtmInstance.setFlightData({
'ITEM_TYPE': 'flight',
'FLY_OUTB_CODE': departureExp[2],
'FLY_OUTB_CITY': departureExp[1],
'FLY_OUTB_COUNTRYCODE': departureExp[0],
'FLY_OUTB_DATE': keysearch.departDate,
'FLY_INB_CODE': arrivalExp[2],
'FLY_INB_CITY': arrivalExp[1],
'FLY_INB_COUNTRYCODE': arrivalExp[0],
'FLY_INB_DATE': keysearch.returnDate,
'FLY_NBPAX_ADL': keysearch.adultNum,
'FLY_NBPAX_CHL': keysearch.childNum,
'FLY_NBPAX_INF': keysearch.infantNum,
});
gtmInstance.pushFlightSearchEvent();
}
}(window, gtmInstance, keysearch));
var key = "rkey=10fe7b6fd1f7fa1ef0f4fa538f917811dbc7f4628a791ba69962f2ed305fb72d061b67737afd843aaaeeee946f1442bb";
var staticRoot = 'http://sta.nusatrip.net';
$(function() {
$("#currencySelector").nusaCurrencyOptions({
selected: getCookie("curCode"),
});
});
</script>
</head>
我会给你我想要的剧本
.....
var keysearch={“出发标签”:“泗水(SUB:Juanda)爪哇帖木儿印度尼西亚”,“抵达标签”:“帕朗卡拉亚(PKY:Tjilik Riwut | Panarung)加里曼丹登加印度尼西亚”,“成人号”:“1”,“儿童号”:“0”,“婴儿号”:“0”,“出发号”:“SUB”,“抵达号”:“PKY”,“出发日期”:“20181115”,“往返票”:0,“cabinType”:-1”,“出发号”:“ID泗水SUB”,“抵达代码”:“ID Palangkaraya PKY”};
(功能(窗口、gtm、按键搜索){
if(window.gtmInstance){
var departureExp=keysearch.departureCode.split(“-”);
var arrivalExp=keysearch.arrivalCode.split(“-”);
gtminInstance.setFlightData({
“项目类型”:“航班”,
“飞出代码”:departureExp[2],
“飞出城市”:部门经验[1],
“飞出国家代码”:部门经验[0],
“飞行日期”:keysearch.departDate,
“飞行输入代码”:arrivalExp[2],
“飞入城市”:Arrivallexp[1],
“FLY_INB_COUNTRYCODE”:Arrivallexp[0],
“飞行日期”:keysearch.returnDate,
“FLY_NBPAX_ADL”:keysearch.adultNum,
“FLY\u NBPAX\u CHL”:keysearch.childNum,
“FLY\u NBPAX\u INF”:keysearch.infantNum,
});
gtminInstance.pushFlightSearchEvent();
}
}(窗口、gtmInstance、键搜索);
var key=“rkey=10FE7B6FD1F7FA1EF0F4FA538F917811DBC7F4628A791BA69962F2ED305FB72D061B67737AFD843AAEE946F1442BB”;
var staticRoothttp://sta.nusatrip.net';
$(函数(){
$(“#currencySelector”).nusaCurrencyOptions({
选中:getCookie(“curCode”),
});
});
我想获取键变量。我将使用它从网站获取数据。感谢,根据标记的其余部分,您可能只需要使用and,然后使用解析出变量的值。此示例将回显键
<?php
$html = <<<END
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<script type="text/javascript">
var keysearch = {"departureLabel":"Surabaya (SUB : Juanda) Jawa Timur Indonesia","arrivalLabel":"Palangkaraya (PKY : Tjilik Riwut | Panarung) Kalimantan Tengah Indonesia","adultNum":"1","childNum":"0","infantNum":"0","departure":"SUB","arrival":"PKY","departDate":"20181115","roundTrip":0,"cabinType":-1,"departureCode":"ID-Surabaya-SUB","arrivalCode":"ID-Palangkaraya-PKY"};
(function(window, _gtm, keysearch){
if (window.gtmInstance){
var departureExp = keysearch.departureCode.split("-");
var arrivalExp = keysearch.arrivalCode.split("-");
gtmInstance.setFlightData({
'ITEM_TYPE': 'flight',
'FLY_OUTB_CODE': departureExp[2],
'FLY_OUTB_CITY': departureExp[1],
'FLY_OUTB_COUNTRYCODE': departureExp[0],
'FLY_OUTB_DATE': keysearch.departDate,
'FLY_INB_CODE': arrivalExp[2],
'FLY_INB_CITY': arrivalExp[1],
'FLY_INB_COUNTRYCODE': arrivalExp[0],
'FLY_INB_DATE': keysearch.returnDate,
'FLY_NBPAX_ADL': keysearch.adultNum,
'FLY_NBPAX_CHL': keysearch.childNum,
'FLY_NBPAX_INF': keysearch.infantNum,
});
gtmInstance.pushFlightSearchEvent();
}
}(window, gtmInstance, keysearch));
var key = "rkey=10fe7b6fd1f7fa1ef0f4fa538f917811dbc7f4628a791ba69962f2ed305fb72d061b67737afd843aaaeeee946f1442bb";
var staticRoot = 'http://sta.nusatrip.net';
$(function() {
$("#currencySelector").nusaCurrencyOptions({
selected: getCookie("curCode"),
});
});
</script>
</head>
<body>foo</body>
</html>
END;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = $xpath->query('//script');
foreach($result as $currScriptTag)
{
$currScriptContent = $currScriptTag->nodeValue;
$matchFound = preg_match('/var key = "(.*)"/', $currScriptContent, $matches);
if($matchFound)
{
/*
* $matches[0] will contain the whole line like var key = "..."
* $matches[1] just contains the value of the var
*/
$key = $matches[1];
echo $key.PHP_EOL;
}
}
什么东西不适合DOM方法?我只是不知道如何获取脚本标记值。这就像隐藏的$data\u DOM->find('head'))
-如果你想要
中的数据,那么你应该试着找到一个脚本而不是头!我已经尝试过了,但它只是给出了你正在使用的dom解析器的空结果?哇,这是工作。非常感谢,仍然不知道XPath。你有什么建议我应该在哪里学习它吗?谢谢@Rob RuchteXPath非常成熟并且广泛使用,so有很多资料,但我经常使用,这是我找到的最简洁的参考资料之一:再次感谢,你真的帮助了我