用PHP删除JSON
我用Xpath做了很多HTML抓取。但现在我不得不刮一些JSON,不知道如何做到这一点。我想搜集的资料来源是:用PHP删除JSON,php,regex,json,web-scraping,scrape,Php,Regex,Json,Web Scraping,Scrape,我用Xpath做了很多HTML抓取。但现在我不得不刮一些JSON,不知道如何做到这一点。我想搜集的资料来源是: { "ASIN" : "B00DR4LYHY", "FeatureName" : "price_feature_div", "Type" : "JSON", "Value" : { "content" :
{
"ASIN" : "B00DR4LYHY",
"FeatureName" : "price_feature_div",
"Type" : "JSON",
"Value" :
{
"content" :
{"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\"
class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"}
}
}
我需要的是37.60美元的价格
我正在使用的代码(由Venkata提供)是:
$URL = 'http://www.amazon.com/gp/twister/ajaxv2?sid=188-4344403-7969026&ptd=OUTERWEAR&json=1&dpxAjaxFlag=1&sCac=1&isUDPFlag=1&twisterView=glance&ee=2&pgid=apparel_display_on_website&sr=1-3&nodeID=1036592&rid=0Q05FXGQJSA20X44DJVG&parentAsin=B00DR4LUQY&enPre=1&qid=1413775191&dStr=size_name%2Ccolor_name&auiAjax=1&storeID=apparel&psc=1&asinList=B00DR4LYHY&isFlushing=2&id=B00DR4LYHY&prefetchParam=0&mType=full&dpEnvironment=softlines';
$page = file_get_contents($URL);
$decoded = json_decode($page);
$html = $decoded->Value->content->price_feature_div;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//frem dom method
$elements = $dom->getElementById("priceblock_ourprice")->item(0);
//OR use extract it from xpath like below line
$priceNode = $xpath->query("//*[@id='priceblock_ourprice']");
if (!is_null($elements)) {
//$priceNode = $elements->item(0);
$ourPrice = $priceNode;
echo $ourPrice;
}
我认为最好使用正则表达式,但表达式应该是什么样子?用PHP提取
在前端提取我在下面的解决方案中使用了jQuery
注意:我在price\u feature\u div html valuein JSON value中发现语法错误,即使它是html字符串,也应该是单行。注意到HTML中有两个换行符
我认为最好使用正则表达式,但是表达式应该是什么样的呢
在某些情况下,对于大小有限的非结构化html文本片段,正则表达式比xpath工作得更好
所以,你只需要获得原始数据,坚持美元,你就能得到你想要的
$page = file_get_contents($URL);
$pattern = '/\$[\d.]+/';
$preg_match($page, $pattern, $matches);
echo 'price = ', $matches[0];
请参阅。解码json,提取html,然后像往常一样将其输入dom。不,最好不要正则表达式。@MarcB谢谢,但是,你能解释一下怎么做吗?@MarcB问题是解码后我得到一个空数组:可能意味着json被破坏了。json_last_错误应该告诉您一点。更多。谢谢你的回答!但我不能得到那个输出。“val ourPrice=$jsonObj…”中的“val”是什么我是php的,不知道这意味着什么。我从一个URL获得这个源代码,我会在我的问题中发布这个URL,那么确切的代码是什么?不客气。对不起,这是打字错误,应该是var,我现在更正了,请看一下。很想知道你在哪里提取?服务器端或客户端端?用php代码更新答案;在json_decode@MarcB之后使用DOM进行的提取已经表明PHP代码是我所需要的,但是在解码json时我仍然得到了空变量。我将更新并编写我在问题中使用的完整代码。很抱歉这么麻烦。
$json_string = '{"ASIN" : "B00DR4LYHY","FeatureName" : "price_feature_div","Type" : "JSON","Value" : {"content" : {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"}}}';
$decoded = json_decode($json_string);
$html = $decoded->Value->content->price_feature_div;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//frem dom method
$elements = $dom->getElementById("priceblock_ourprice")->item(0);
//OR use extract it from xpath like below line
//$priceNode = $xpath->query("//*[@id='priceblock_ourprice']");
if (!is_null($elements)) {
$priceNode = $elements->item(0);
$ourPrice = $priceNode;
echo $ourPrice;
}
var jsonObj={
"ASIN" : "B00DR4LYHY",
"FeatureName" : "price_feature_div",
"Type" : "JSON",
"Value" :
{
"content" :
{"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"}
}
};
//using jQuery we extracted the price
var ourPrice = $(jsonObj.Value.content.price_feature_div).find("#priceblock_ourprice").text();
console.log(ourPrice);//"$37.60" is the value you can see in the browser-console
$page = file_get_contents($URL);
$pattern = '/\$[\d.]+/';
$preg_match($page, $pattern, $matches);
echo 'price = ', $matches[0];