Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:TwitterAPI推文/搜索:将嵌套字典展平到列 用于将Twitter输出扁平化为.csv/Tableau/SQL的扁平列的Python代码_Python_Json_Twitter_Dictionary_Flatten - Fatal编程技术网

Python:TwitterAPI推文/搜索:将嵌套字典展平到列 用于将Twitter输出扁平化为.csv/Tableau/SQL的扁平列的Python代码

Python:TwitterAPI推文/搜索:将嵌套字典展平到列 用于将Twitter输出扁平化为.csv/Tableau/SQL的扁平列的Python代码,python,json,twitter,dictionary,flatten,Python,Json,Twitter,Dictionary,Flatten,在使用中,twitter\u输出是原始JSON输出的字典项。这可以直接来自或使用twitter\u output=json.load(twitter\u json\u对象) 下面是我最初的问题。。。 原始问题 我将tweet描述符的嵌套字典展平为一个列式输出,在每个嵌套字典项级别之间使用“_u”分隔符,并将其打印到终端作为视觉测试 下面的代码成功地做到了这一点,但我对Python的了解还不够,无法使它更优雅 提取终端中展平的字典输出(编辑一点,并添加空格和u“”,以使其更清晰): 推特格式 我从

在使用中,
twitter\u输出
是原始JSON输出的字典项。这可以直接来自或使用
twitter\u output=json.load(twitter\u json\u对象)

下面是我最初的问题。。。 原始问题 我将tweet描述符的嵌套字典展平为一个列式输出,在每个嵌套字典项级别之间使用“_u”分隔符,并将其打印到终端作为视觉测试

下面的代码成功地做到了这一点,但我对Python的了解还不够,无法使它更优雅

提取终端中展平的字典输出(编辑一点,并添加空格和u“”,以使其更清晰):

推特格式 我从
.search()
方法以字典格式获得以下示例搜索响应:

response = [{u'contributors': None, u'truncated': False, u'text': u'Hate It or love It? Kim Kardashian in Balmain &amp; Alexander McQueen [Photos] http://t.co/UrNFK5yDPU via @lovebscott http://t.co/lZW9GAzvhx', u'in_reply_to_status_id': None, u'id': 537357629975064577, u'favorite_count': 0, u'source': u'<a href="http://tapbots.com/software/tweetbot/mac" rel="nofollow">Tweetbot for Mac</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'user_mentions': [{u'id': 14521926, u'indices': [106, 117], u'id_str': u'14521926', u'screen_name': u'lovebscott', u'name': u'B. Scott'}], u'hashtags': [], u'urls': [{u'url': u'http://t.co/UrNFK5yDPU', u'indices': [79, 101], u'expanded_url': u'http://www.lovebscott.com/fashion/hate-it-or-love-it-kim-kardashian-in-balmain-alexander-mcqueen-photos', u'display_url': u'lovebscott.com/fashion/hate-i\u2026'}], u'media': [{u'expanded_url': u'http://twitter.com/lovebscott/status/537357629975064577/photo/1', u'display_url': u'pic.twitter.com/lZW9GAzvhx', u'url': u'http://t.co/lZW9GAzvhx', u'media_url_https': u'https://pbs.twimg.com/media/B3UT6skCMAIDw62.jpg', u'id_str': u'537357629656281090', u'sizes': {u'large': {u'h': 612, u'resize': u'fit', u'w': 610}, u'small': {u'h': 341, u'resize': u'fit', u'w': 340}, u'medium': {u'h': 601, u'resize': u'fit', u'w': 600}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [118, 140], u'type': u'photo', u'id': 537357629656281090, u'media_url': u'http://pbs.twimg.com/media/B3UT6skCMAIDw62.jpg'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 0, u'id_str': u'537357629975064577', u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': False, u'profile_text_color': u'333333', u'default_profile_image': False, u'id': 14521926, u'profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg', u'verified': True, u'profile_location': None, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg', u'profile_sidebar_fill_color': u'EFEFEF', u'entities': {u'url': {u'urls': [{u'url': u'http://t.co/3nt6d6jM9p', u'indices': [0, 22], u'expanded_url': u'http://lovebscott.com', u'display_url': u'lovebscott.com'}]}, u'description': {u'urls': []}}, u'followers_count': 161968, u'profile_sidebar_border_color': u'FFFFFF', u'id_str': u'14521926', u'profile_background_color': u'131516', u'listed_count': 1905, u'is_translation_enabled': False, u'utc_offset': -18000, u'statuses_count': 58304, u'description': u'#KingofFabulous - #TheMultimediaMaven - Mogul - TV / Internet Personality - @EBONYMag Advice Columnist - @glam_com Contributing Editor', u'friends_count': 373, u'location': u'Los Angeles, CA', u'profile_link_color': u'009999', u'profile_image_url': u'http://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg', u'following': None, u'geo_enabled': True, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/14521926/1403029806', u'profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg', u'name': u'B. Scott', u'lang': u'en', u'profile_background_tile': True, u'favourites_count': 14, u'screen_name': u'lovebscott', u'notifications': None, u'url': u'http://t.co/3nt6d6jM9p', u'created_at': u'Fri Apr 25 03:29:42 +0000 2008', u'contributors_enabled': False, u'time_zone': u'Quito', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': True, u'lang': u'en', u'created_at': u'Tue Nov 25 21:30:17 +0000 2014', u'in_reply_to_status_id_str': None, u'place': None, u'metadata': {u'iso_language_code': u'en', u'result_type': u'recent'}}]
方法
  • 在字典打印中循环使用
    键、值,直到
    为嵌套结构
  • 确定嵌套结构
    值是什么:
    
  • 如果列表:为实体()运行测试
  • 如果是字典,请转到1
  • 否则输出整个结构
  • 我认为2.1、2.2和2.3的迭代可能会更干净,但我不知道怎么做:(

    代码 注意:函数
    test\u for_entity()
    处理实体的嵌套结构,因为并非所有信息都相关。每当下一个嵌套结构是列表而不是字典时,就会使用该函数


    遇到阵列时,您没有明确说明要执行什么操作,因此请别管它:

    将变量
    user
    作为您选择的Twitter响应的子集:

    user = {u'user': {u'lang': u'en', u'utc_offset': -18000, u'statuses_count': 58304, u'default_profile_image': False, u'friends_count': 373, u'profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg', u'profile_use_background_image': False, u'profile_sidebar_fill_color': u'EFEFEF', u'profile_link_color': u'009999', u'profile_image_url': u'http://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg', u'time_zone': u'Quito', u'is_translator': False, u'screen_name': u'lovebscott', u'url': u'http://t.co/3nt6d6jM9p', u'verified': True, u'geo_enabled': True, u'profile_background_color': u'131516', u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/14521926/1403029806', u'id': 14521926, u'profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg', u'description': u'#KingofFabulous - #TheMultimediaMaven - Mogul - TV / Internet Personality - @EBONYMag Advice Columnist - @glam_com Contributing Editor', u'is_translation_enabled': False, u'profile_background_tile': True, u'favourites_count': 14, u'name': u'B. Scott', u'notifications': None, u'follow_request_sent': None, u'profile_text_color': u'333333', u'created_at': u'Fri Apr 25 03:29:42 +0000 2008', u'profile_location': None, u'contributors_enabled': False, u'location': u'Los Angeles, CA', u'entities': {u'url': {u'urls': [{u'indices': [0, 22], u'url': u'http://t.co/3nt6d6jM9p', u'expanded_url': u'http://lovebscott.com', u'display_url': u'lovebscott.com'}]}, u'description': {u'urls': []}}, u'followers_count': 161968, u'profile_sidebar_border_color': u'FFFFFF', u'id_str': u'14521926', u'default_profile': False, u'following': None, u'protected': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg', u'listed_count': 1905}}
    
    您可以编写一个递归函数,在字典中任意深入地连接关键字,直到它运行在一个不是字典的项目上,它将考虑树的那个节点的“最终值”。
    def process(indict, current_key=None, outerdict=None):
        if outerdict is None:
            outerdict = {}
        for key, value in indict.iteritems():
            newkey = current_key + '__' + key if current_key else key
            if type(value) is not dict:
                outerdict[newkey] = value
            else:
                process(value, current_key=newkey, outerdict=outerdict)
        return outerdict
    
    其结果是:

    >>> pprint.pprint(process(user))
    {u'user__contributors_enabled': False,
     u'user__created_at': u'Fri Apr 25 03:29:42 +0000 2008',
     u'user__default_profile': False,
     u'user__default_profile_image': False,
     u'user__description': u'#KingofFabulous - #TheMultimediaMaven - Mogul - TV / Internet Personality - @EBONYMag Advice Columnist - @glam_com Contributing Editor',
     u'user__entities__description__urls': [],
     u'user__entities__url__urls': [{u'display_url': u'lovebscott.com',
                                     u'expanded_url': u'http://lovebscott.com',
                                     u'indices': [0, 22],
                                     u'url': u'http://t.co/3nt6d6jM9p'}],
     u'user__favourites_count': 14,
     u'user__follow_request_sent': None,
     u'user__followers_count': 161968,
     u'user__following': None,
     u'user__friends_count': 373,
     u'user__geo_enabled': True,
     u'user__id': 14521926,
     u'user__id_str': u'14521926',
     u'user__is_translation_enabled': False,
     u'user__is_translator': False,
     u'user__lang': u'en',
     u'user__listed_count': 1905,
     u'user__location': u'Los Angeles, CA',
     u'user__name': u'B. Scott',
     u'user__notifications': None,
     u'user__profile_background_color': u'131516',
     u'user__profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg',
     u'user__profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg',
     u'user__profile_background_tile': True,
     u'user__profile_banner_url': u'https://pbs.twimg.com/profile_banners/14521926/1403029806',
     u'user__profile_image_url': u'http://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg',
     u'user__profile_image_url_https': u'https://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg',
     u'user__profile_link_color': u'009999',
     u'user__profile_location': None,
     u'user__profile_sidebar_border_color': u'FFFFFF',
     u'user__profile_sidebar_fill_color': u'EFEFEF',
     u'user__profile_text_color': u'333333',
     u'user__profile_use_background_image': False,
     u'user__protected': False,
     u'user__screen_name': u'lovebscott',
     u'user__statuses_count': 58304,
     u'user__time_zone': u'Quito',
     u'user__url': u'http://t.co/3nt6d6jM9p',
     u'user__utc_offset': -18000,
     u'user__verified': True}
    
    没有数组:)如果它们出现,它们只会集中在一个“else”中。谢谢你的解决方案,虽然我正在寻找一个漂亮的方法来遍历字典
    {
    key0:values_0,
    key_0:{
        key_1:values_1,
        key_1:{
            key_2:values_2,
            key_2:{
                key_3:values_3
                }
            }
        }
    }
    
    def test_for_entity(root,key,entity_value):
        # test if list is entity
        parent_key = root.split("__")[-1:][0]
        if 'entities'in root.split("__"):
            # Entities for tweets
            if key in ("symbols","hashtags"):
                list_items = [list_item['text'] for list_item in entity_value]
                print root+"__"+key,list_items
            elif key == "media":
                list_items = [[list_item['type'],list_item['media_url']] for list_item in entity_value]
                print root+"__"+key,list_items
            elif key == "urls":
                list_items = [list_item['expanded_url'] for list_item in entity_value]
                print root+"__"+key,list_items
            elif key == "user_mentions":
                list_items = [list_item['screen_name'] for list_item in entity_value]
                print root+"__"+key,list_items
            # Entities for users
            elif key == "url":
                list_items = [list_item['expanded_url'] for list_item in entity_value['urls']]
                print root+"__"+key,list_items
            elif key == "description":
                list_items = [list_item['expanded_url'] for list_item in entity_value['urls']]
                print root+"__"+key,list_items
            else:
                print "[ERROR: unknown entity name'"+str(key)+"']","list",parent_key+"__"+key,list_items
        else:
            list_items = [list_item for list_item in entity_value]
            print root+"__"+key,list_items,parent_key
    
    for tweet in response:
        for key_0,value_0 in tweet.items():
            if type(value_0) is dict: 
                for key_1,value_1 in value_0.items():
                    if type(value_1) is dict: 
                        for key_2,value_2 in value_1.items():
                            if type(value_2) is dict: 
                                for key_3,value_3 in value_2.items():
                                    if type(value_3) is dict:
                                        # Limit of recursive unpacking...
                                        print key_0+"__"+key_1+"__"+key_2+"__"+key_3,value_3
                                    elif type(value_3) is list:
                                        test_for_entity(root = key_0+"__"+key_1+"__"+key_2,key = key_3,entity_value=value_3)
                                    else:
                                        print key_0+"__"+key_1+"__"+key_2+"__"+key_3,value_3
                            elif type(value_2) is list:
                                test_for_entity(root = key_0+"__"+key_1,key = key_2,entity_value=value_2)
                            else:
                                print key_0+"__"+key_1+"__"+key_2,value_2
                    elif type(value_1) is list:
                        test_for_entity(root=key_0,key = key_1,entity_value=value_1)
                    else:
                        print key_0+"__"+key_1,value_1
            elif type(value_0) is list:
                test_for_entity(root="",key = key_0,entity_value=value_0)
            else:
                print key_0,value_0
    
    user = {u'user': {u'lang': u'en', u'utc_offset': -18000, u'statuses_count': 58304, u'default_profile_image': False, u'friends_count': 373, u'profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg', u'profile_use_background_image': False, u'profile_sidebar_fill_color': u'EFEFEF', u'profile_link_color': u'009999', u'profile_image_url': u'http://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg', u'time_zone': u'Quito', u'is_translator': False, u'screen_name': u'lovebscott', u'url': u'http://t.co/3nt6d6jM9p', u'verified': True, u'geo_enabled': True, u'profile_background_color': u'131516', u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/14521926/1403029806', u'id': 14521926, u'profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg', u'description': u'#KingofFabulous - #TheMultimediaMaven - Mogul - TV / Internet Personality - @EBONYMag Advice Columnist - @glam_com Contributing Editor', u'is_translation_enabled': False, u'profile_background_tile': True, u'favourites_count': 14, u'name': u'B. Scott', u'notifications': None, u'follow_request_sent': None, u'profile_text_color': u'333333', u'created_at': u'Fri Apr 25 03:29:42 +0000 2008', u'profile_location': None, u'contributors_enabled': False, u'location': u'Los Angeles, CA', u'entities': {u'url': {u'urls': [{u'indices': [0, 22], u'url': u'http://t.co/3nt6d6jM9p', u'expanded_url': u'http://lovebscott.com', u'display_url': u'lovebscott.com'}]}, u'description': {u'urls': []}}, u'followers_count': 161968, u'profile_sidebar_border_color': u'FFFFFF', u'id_str': u'14521926', u'default_profile': False, u'following': None, u'protected': False, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg', u'listed_count': 1905}}
    
    def process(indict, current_key=None, outerdict=None):
        if outerdict is None:
            outerdict = {}
        for key, value in indict.iteritems():
            newkey = current_key + '__' + key if current_key else key
            if type(value) is not dict:
                outerdict[newkey] = value
            else:
                process(value, current_key=newkey, outerdict=outerdict)
        return outerdict
    
    >>> pprint.pprint(process(user))
    {u'user__contributors_enabled': False,
     u'user__created_at': u'Fri Apr 25 03:29:42 +0000 2008',
     u'user__default_profile': False,
     u'user__default_profile_image': False,
     u'user__description': u'#KingofFabulous - #TheMultimediaMaven - Mogul - TV / Internet Personality - @EBONYMag Advice Columnist - @glam_com Contributing Editor',
     u'user__entities__description__urls': [],
     u'user__entities__url__urls': [{u'display_url': u'lovebscott.com',
                                     u'expanded_url': u'http://lovebscott.com',
                                     u'indices': [0, 22],
                                     u'url': u'http://t.co/3nt6d6jM9p'}],
     u'user__favourites_count': 14,
     u'user__follow_request_sent': None,
     u'user__followers_count': 161968,
     u'user__following': None,
     u'user__friends_count': 373,
     u'user__geo_enabled': True,
     u'user__id': 14521926,
     u'user__id_str': u'14521926',
     u'user__is_translation_enabled': False,
     u'user__is_translator': False,
     u'user__lang': u'en',
     u'user__listed_count': 1905,
     u'user__location': u'Los Angeles, CA',
     u'user__name': u'B. Scott',
     u'user__notifications': None,
     u'user__profile_background_color': u'131516',
     u'user__profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg',
     u'user__profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/457457546580602880/VxHBaVbH.jpeg',
     u'user__profile_background_tile': True,
     u'user__profile_banner_url': u'https://pbs.twimg.com/profile_banners/14521926/1403029806',
     u'user__profile_image_url': u'http://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg',
     u'user__profile_image_url_https': u'https://pbs.twimg.com/profile_images/531605236696104960/cG-Lu2y6_normal.jpeg',
     u'user__profile_link_color': u'009999',
     u'user__profile_location': None,
     u'user__profile_sidebar_border_color': u'FFFFFF',
     u'user__profile_sidebar_fill_color': u'EFEFEF',
     u'user__profile_text_color': u'333333',
     u'user__profile_use_background_image': False,
     u'user__protected': False,
     u'user__screen_name': u'lovebscott',
     u'user__statuses_count': 58304,
     u'user__time_zone': u'Quito',
     u'user__url': u'http://t.co/3nt6d6jM9p',
     u'user__utc_offset': -18000,
     u'user__verified': True}