使用清管器&x27;s JsonLoader()和来自Tweets的Json

使用清管器&x27;s JsonLoader()和来自Tweets的Json,json,twitter,apache-pig,Json,Twitter,Apache Pig,我对Pig的JsonLoader函数的参数很难理解。Json对象相当大,给我带来问题的部分是“entities”字段中的所有内容。如果我去掉这个,我可以让JsonLoader()正常工作。有人能帮我解释一下这一部分的模式吗?以下是一条推文的Json: { "contributors": null, "truncated": false, "text": "North Korea Says US 'Hell-Bent on Regime Change': North Korea says US '

我对Pig的JsonLoader函数的参数很难理解。Json对象相当大,给我带来问题的部分是“entities”字段中的所有内容。如果我去掉这个,我可以让JsonLoader()正常工作。有人能帮我解释一下这一部分的模式吗?以下是一条推文的Json:

{
"contributors": null,
"truncated": false,
"text": "North Korea Says US 'Hell-Bent on Regime Change': North Korea says US 'hell-bent on regime change' and threate... http://t.co/FM4GhdQAcG",
"in_reply_to_status_id": null,
"id": 452128135731884000,
"favorite_count": 0,
"source": "<a href=\"http://twitterfeed.com\" rel=\"nofollow\">twitterfeed</a>",
"retweeted": false,
"coordinates": null,
"entities": {
    "symbols": [],
    "user_mentions": [],
    "hashtags": [],
    "urls": [
        {
            "url": "http://t.co/FM4GhdQAcG",
            "indices": [
                114,
                136
            ],
            "expanded_url": "http://abcn.ws/1jb6ANh",
            "display_url": "abcn.ws/1jb6ANh"
        }
    ]
},
"in_reply_to_screen_name": null,
"id_str": "452128135731884033",
"retweet_count": 0,
"in_reply_to_user_id": null,
"favorited": false,
"user": {
    "follow_request_sent": null,
    "profile_use_background_image": true,
    "default_profile_image": false,
    "id": 1484045802,
    "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/450180280033091584/ukwF1xQ1.jpeg",
    "verified": false,
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/450177921198465024/5EbZX19P_normal.jpeg",
    "profile_sidebar_fill_color": "DDEEF6",
    "profile_text_color": "333333",
    "followers_count": 178,
    "profile_sidebar_border_color": "000000",
    "id_str": "1484045802",
    "profile_background_color": "FF3333",
    "listed_count": 0,
    "is_translation_enabled": false,
    "utc_offset": -10800,
    "statuses_count": 2900,
    "description": "Unico Menor Con Flow Mi Watsshat 18297015049",
    "friends_count": 103,
    "location": "santo domingo",
    "profile_link_color": "FF3333",
    "profile_image_url": "http://pbs.twimg.com/profile_images/450177921198465024/5EbZX19P_normal.jpeg",
    "following": null,
    "geo_enabled": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1484045802/1396166038",
    "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/450180280033091584/ukwF1xQ1.jpeg",
    "name": "Nïñø Mälø",
    "lang": "es",
    "profile_background_tile": true,
    "favourites_count": 2,
    "screen_name": "YeralMueka",
    "notifications": null,
    "url": "https://www.facebook.com/YeralMueka",
    "created_at": "Wed Jun 05 04:41:09 +0000 2013",
    "contributors_enabled": false,
    "time_zone": "Santiago",
    "protected": false,
    "default_profile": false,
    "is_translator": false
},
"geo": null,
"in_reply_to_user_id_str": null,
"possibly_sensitive": true,
"lang": "en",
"created_at": "Fri Apr 04 16:58:42 +0000 2014",
"filter_level": "medium",
"in_reply_to_status_id_str": null,
"place": null
}
{
“贡献者”:空,
“截断”:false,
《文本》:“朝鲜称美国‘决意更换政权’:朝鲜称美国‘决意更换政权’并威胁。。。http://t.co/FM4GhdQAcG",
“在对状态id的回复中”:null,
“id”:452128135731884000,
“收藏夹计数”:0,
“来源”:“,
“转发”:错误,
“坐标”:空,
“实体”:{
“符号”:[],
“用户_提及”:[],
“hashtags”:[],
“URL”:[
{
“url”:”http://t.co/FM4GhdQAcG",
“指数”:[
114,
136
],
“扩展url”:http://abcn.ws/1jb6ANh",
“显示url”:“abcn.ws/1jb6ANh”
}
]
},
“回复至屏幕名称”为空,
“id_街”:“452128135731884033”,
“转发计数”:0,
“in_reply_to_user_id”:null,
“偏爱”:错误,
“用户”:{
“跟踪请求发送”:空,
“profile\u use\u background\u image”:真,
“默认_配置文件_图像”:false,
“id”:1484045802,
“配置文件\u背景\u图像\u url\u https”:https://pbs.twimg.com/profile_background_images/450180280033091584/ukwF1xQ1.jpeg",
“已验证”:错误,
“配置文件\u图像\u url\u https”:https://pbs.twimg.com/profile_images/450177921198465024/5EbZX19P_normal.jpeg",
“配置文件\侧边栏\填充\颜色”:“DDEEF6”,
“配置文件\文本\颜色”:“333333”,
“追随者数量”:178,
“配置文件\侧边栏\边框\颜色”:“000000”,
“id_街”:“1484045802”,
“配置文件\背景\颜色”:“FF3333”,
“列出的计数”:0,
“是否已启用翻译”:false,
“utc_偏移量”:-10800,
“状态计数”:2900,
“说明”:“Unico Menor Con Flow Mi Watsshat 18297015049”,
“朋友数”:103,
“地点”:“圣多明各”,
“配置文件链接颜色”:“FF3333”,
“配置文件\图像\ url”:http://pbs.twimg.com/profile_images/450177921198465024/5EbZX19P_normal.jpeg",
“following”:空,
“已启用地理位置”:false,
“配置文件\u横幅\u url”:https://pbs.twimg.com/profile_banners/1484045802/1396166038",
“配置文件\背景\图像\ url”:http://pbs.twimg.com/profile_background_images/450180280033091584/ukwF1xQ1.jpeg",
“姓名”:“NïñøMälø”,
“朗”:“es”,
“配置文件\u背景\u平铺”:正确,
“最受欢迎的节目”2,
“屏幕名称”:“YeralMueka”,
“通知”:空,
“url”:”https://www.facebook.com/YeralMueka",
“创建时间”:“2013年6月5日星期三04:41:09+0000”,
“已启用贡献者”:false,
“时区”:“圣地亚哥”,
“受保护”:错误,
“默认配置文件”:false,
“is_translator”:错误
},
“geo”:空,
“在对用户id的回复中”:null,
“可能敏感”:没错,
“郎”:“恩”,
“创建时间”:“2014年4月4日星期五16:58:42+0000”,
“过滤级”:“中等”,
“在对状态的回复中”id“str”:空,
“地点”:空
}

您可以通过twitter使用大象鸟图书馆:


下面是一个在不使用自定义JsonLoader指定模式的情况下加载JSON的示例:

我也曾处理过twitter推文,因此我意识到,有时候推文在字段中会有所不同(有些推文比其他推文包含附加字段),即推文是非结构化的。如果您的输入是结构化的,您可以在pig中使用JsonLoader…或者您不能这样做…所以要处理它,只需在pig中定义您自己的udf即可。要在pig中创建udf,请遵循以下链接

           http://pig.apache.org/docs/r0.11.1/udf.html#udf-java

问题是什么还不清楚。请阅读,并特别注意“黄金法则”。你越清楚、越具体,就越有可能得到答案。