Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/269.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用PHP删除JSON_Php_Regex_Json_Web Scraping_Scrape - Fatal编程技术网

用PHP删除JSON

用PHP删除JSON,php,regex,json,web-scraping,scrape,Php,Regex,Json,Web Scraping,Scrape,我用Xpath做了很多HTML抓取。但现在我不得不刮一些JSON,不知道如何做到这一点。我想搜集的资料来源是: { "ASIN" : "B00DR4LYHY", "FeatureName" : "price_feature_div", "Type" : "JSON", "Value" : { "content" :

我用Xpath做了很多HTML抓取。但现在我不得不刮一些JSON,不知道如何做到这一点。我想搜集的资料来源是:

     {
            "ASIN" : "B00DR4LYHY",
            "FeatureName" : "price_feature_div",
            "Type" : "JSON",
            "Value" : 
            {
                "content" : 
                {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n    \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t        \n\t\t                \n\t\t                            \n\t\t\t\t        \n\t\t                \n\t\t\t\t        \n\n\n\n\n\n\t\n<tr>\n    <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n    <td class=\"a-span12\">\n        <span id=\"priceblock_ourprice\" 

class=\"a-size-medium a-color-price\">$37.60<\/span>\n        \n\n\n\n        \n\n\n\n\n\n\n        \n\n        <span id=\"ourprice_shippingmessage\">\t\n        \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n    \n        \n        \n        \n\n\t    \n\t\t\n\t\t\n        \n            <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n        \n        \n    \n\n\n\n        <\/span>\n        \n        \n        \n        \n    <\/td>\n<\/tr>\n\n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t            \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n    \n    \n\t\n<\/table>\n<\/div>"}

        }
    }
我需要的是37.60美元的价格

我正在使用的代码(由Venkata提供)是:

    $URL    = 'http://www.amazon.com/gp/twister/ajaxv2?sid=188-4344403-7969026&ptd=OUTERWEAR&json=1&dpxAjaxFlag=1&sCac=1&isUDPFlag=1&twisterView=glance&ee=2&pgid=apparel_display_on_website&sr=1-3&nodeID=1036592&rid=0Q05FXGQJSA20X44DJVG&parentAsin=B00DR4LUQY&enPre=1&qid=1413775191&dStr=size_name%2Ccolor_name&auiAjax=1&storeID=apparel&psc=1&asinList=B00DR4LYHY&isFlushing=2&id=B00DR4LYHY&prefetchParam=0&mType=full&dpEnvironment=softlines';



    $page = file_get_contents($URL);
    $decoded = json_decode($page);

    $html = $decoded->Value->content->price_feature_div;


$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

//frem dom method 
$elements = $dom->getElementById("priceblock_ourprice")->item(0);

//OR use extract it from xpath like below line
$priceNode = $xpath->query("//*[@id='priceblock_ourprice']");

if (!is_null($elements)) {
    //$priceNode = $elements->item(0);
    $ourPrice = $priceNode;
    echo $ourPrice;
}
我认为最好使用正则表达式,但表达式应该是什么样子?

用PHP提取

在前端提取我在下面的解决方案中使用了jQuery

注意:我在price\u feature\u div html valuein JSON value中发现语法错误,即使它是html字符串,也应该是单行。注意到HTML中有两个换行符

我认为最好使用正则表达式,但是表达式应该是什么样的呢

在某些情况下,对于大小有限的非结构化html文本片段,正则表达式比xpath工作得更好

所以,你只需要获得原始数据,坚持美元,你就能得到你想要的

 $page = file_get_contents($URL);
 $pattern = '/\$[\d.]+/';
 $preg_match($page, $pattern, $matches);
 echo 'price = ', $matches[0];

请参阅。

解码json,提取html,然后像往常一样将其输入dom。不,最好不要正则表达式。@MarcB谢谢,但是,你能解释一下怎么做吗?@MarcB问题是解码后我得到一个空数组:可能意味着json被破坏了。json_last_错误应该告诉您一点。更多。谢谢你的回答!但我不能得到那个输出。“val ourPrice=$jsonObj…”中的“val”是什么我是php的,不知道这意味着什么。我从一个URL获得这个源代码,我会在我的问题中发布这个URL,那么确切的代码是什么?不客气。对不起,这是打字错误,应该是var,我现在更正了,请看一下。很想知道你在哪里提取?服务器端或客户端端?用php代码更新答案;在json_decode@MarcB之后使用DOM进行的提取已经表明PHP代码是我所需要的,但是在解码json时我仍然得到了空变量。我将更新并编写我在问题中使用的完整代码。很抱歉这么麻烦。
$json_string = '{"ASIN" : "B00DR4LYHY","FeatureName" : "price_feature_div","Type" : "JSON","Value" : {"content" : {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n    \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t        \n\t\t                \n\t\t                            \n\t\t\t\t        \n\t\t                \n\t\t\t\t        \n\n\n\n\n\n\t\n<tr>\n    <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n    <td class=\"a-span12\">\n        <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n        \n\n\n\n        \n\n\n\n\n\n\n        \n\n        <span id=\"ourprice_shippingmessage\">\t\n        \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n    \n        \n        \n        \n\n\t    \n\t\t\n\t\t\n        \n            <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n        \n        \n    \n\n\n\n        <\/span>\n        \n        \n        \n        \n    <\/td>\n<\/tr>\n\n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t            \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n    \n    \n\t\n<\/table>\n<\/div>"}}}';

$decoded = json_decode($json_string);
$html = $decoded->Value->content->price_feature_div;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

//frem dom method 
$elements = $dom->getElementById("priceblock_ourprice")->item(0);

//OR use extract it from xpath like below line
//$priceNode = $xpath->query("//*[@id='priceblock_ourprice']");

if (!is_null($elements)) {
    $priceNode = $elements->item(0);
    $ourPrice = $priceNode;
    echo $ourPrice;
}
var jsonObj={
            "ASIN" : "B00DR4LYHY",
            "FeatureName" : "price_feature_div",
            "Type" : "JSON",
            "Value" : 
            {
                "content" : 
                {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n    \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t        \n\t\t                \n\t\t                            \n\t\t\t\t        \n\t\t                \n\t\t\t\t        \n\n\n\n\n\n\t\n<tr>\n    <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">Price:<\/td>\n    <td class=\"a-span12\">\n        <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n        \n\n\n\n        \n\n\n\n\n\n\n        \n\n        <span id=\"ourprice_shippingmessage\">\t\n        \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n    \n        \n        \n        \n\n\t    \n\t\t\n\t\t\n        \n            <span class=\"a-size-base a-color-base\">& <b>FREE Shipping<\/b><\/span>\n        \n        \n    \n\n\n\n        <\/span>\n        \n        \n        \n        \n    <\/td>\n<\/tr>\n\n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\t\t    \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t            \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n    \n    \n\t\n<\/table>\n<\/div>"}

        }
    };
//using jQuery we extracted the price
var ourPrice = $(jsonObj.Value.content.price_feature_div).find("#priceblock_ourprice").text();

console.log(ourPrice);//"$37.60" is the value you can see in the browser-console
 $page = file_get_contents($URL);
 $pattern = '/\$[\d.]+/';
 $preg_match($page, $pattern, $matches);
 echo 'price = ', $matches[0];