从表中列中的json数据中提取键值计数

从表中列中的json数据中提取键值计数,json,pandas,escaping,Json,Pandas,Escaping,我一直在尝试从Pandas中的一列json数据中提取键值值计数,但没有成功。数据格式可在数据框中找到: data = [['ID_1', '{\'RestaurantsTakeOut\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'WiFi\': "u\'no\'", \

我一直在尝试从Pandas中的一列json数据中提取键值值计数,但没有成功。数据格式可在数据框中找到:

    data = [['ID_1', '{\'RestaurantsTakeOut\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'WiFi\': "u\'no\'", \'RestaurantsDelivery\': \'False\', \'OutdoorSeating\': \'False\', \'RestaurantsAttire\': "u\'casual\'", \'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsGoodForGroups\': \'True\', \'RestaurantsReservations\': \'False\', \'HasTV\': \'False\', \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'touristy\': False, \'hipster\': False, \'divey\': False, \'classy\': False, \'trendy\': False, \'upscale\': False, \'casual\': False}", \'Alcohol\': "u\'none\'", \'RestaurantsPriceRange2\': \'1\', \'GoodForKids\': \'True\'}'], 
        ['ID_2','{\'RestaurantsTakeOut\': \'True\', \'HasTV\': \'True\', \'NoiseLevel\': "u\'average\'", \'Alcohol\': "u\'full_bar\'", \'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsAttire\': "u\'casual\'", \'Caters\': \'False\', \'RestaurantsDelivery\': \'False\', \'RestaurantsTakeOut\': \'True\', \'Ambience\': "{\'romantic\': False, \'intimate\': True, \'classy\': False, \'hipster\': False, \'divey\': False, \'touristy\': False, \'trendy\': False, \'upscale\': False, \'casual\': False}", \'RestaurantsGoodForGroups\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}", \'GoodForKids\': \'False\', \'RestaurantsPriceRange2\': \'2\', \'WiFi\': "u\'free\'", \'BikeParking\': \'True\', \'RestaurantsReservations\': \'True\'}' ], 
        ['ID_3','{\'RestaurantsTakeOut\': \'False\', \'GoodForKids\': \'True\', \'NoiseLevel\': "u\'average\'", \'RestaurantsPriceRange2\': \'2\', \'BusinessAcceptsCreditCards\': \'True\', \'HasTV\': \'False\', \'OutdoorSeating\': \'False\', \'RestaurantsTakeOut\': \'True\', \'RestaurantsTableService\': \'True\', \'RestaurantsDelivery\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': True, \'valet\': False}", \'RestaurantsReservations\': \'True\', \'BikeParking\': \'True\', \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': True, \'dinner\': True, \'brunch\': False, \'breakfast\': False}", \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'touristy\': False, \'hipster\': False, \'divey\': False, \'classy\': False, \'trendy\': False, \'upscale\': False, \'casual\': True}", \'WiFi\': "u\'no\'", \'Alcohol\': "\'beer_and_wine\'", \'RestaurantsGoodForGroups\': \'True\', \'RestaurantsAttire\': "\'casual\'"}']] 

df = pd.DataFrame(data, columns = ['business_id', 'attributes']) 
我一直在尝试提取键、值和计数,并将结果以类似于以下的格式显示:

Key1 Value1 Count
Key1 Value2 Count
Key2 Value1 Count
Key2 Value2 Count 
Key3 Value1 Count  
在此之后,我想选择一些键并将这些键填充为数据框中的新列,其中唯一键的值将填充到列中

    business_id atrributes                     RestaurantsTakeOut
0   ID_1        same as in original dataframe  True 
1   ID_2        same as in original dataframe  True 
2   ID_3        same as in original dataframe  False 
任何关于如何获得这些结果的想法都将不胜感激

IIUC

您只需在
ast
模块和pandas
json\u normalize的帮助下卸载json即可

from pandas.io.json import json_normalize
from ast import literal_eval

def unnest_json(dataframe, column):
    dataframe_new = json_normalize(dataframe[column].apply(literal_eval))
    return dataframe_new



df1 = unnest_json(df,'attributes')


# going a level further

print(unnest_json(df1,'BusinessParking'))


   garage  street  validated    lot  valet
0   False   False      False  False  False
1   False    True      False  False  False
2   False   False      False   True  False
注意一些json将有
NaN
字段,您可以
fillna({}')
将它们重新映射为空json字段

通过一个简单的循环,您可以根据密钥创建数据帧字典

json_fields = ['BusinessParking','Ambience','GoodForMeal']
dfs = {}
for field in json_fields:

    try:
        dataframe = unnest_json(df1,field)
    except ValueError:
        dataframe = unnest_json(df1.fillna('{}'),field)

    dfs[field] = dataframe

IIUC

您只需在
ast
模块和pandas
json\u normalize的帮助下卸载json即可

from pandas.io.json import json_normalize
from ast import literal_eval

def unnest_json(dataframe, column):
    dataframe_new = json_normalize(dataframe[column].apply(literal_eval))
    return dataframe_new



df1 = unnest_json(df,'attributes')


# going a level further

print(unnest_json(df1,'BusinessParking'))


   garage  street  validated    lot  valet
0   False   False      False  False  False
1   False    True      False  False  False
2   False   False      False   True  False
注意一些json将有
NaN
字段,您可以
fillna({}')
将它们重新映射为空json字段

通过一个简单的循环,您可以根据密钥创建数据帧字典

json_fields = ['BusinessParking','Ambience','GoodForMeal']
dfs = {}
for field in json_fields:

    try:
        dataframe = unnest_json(df1,field)
    except ValueError:
        dataframe = unnest_json(df1.fillna('{}'),field)

    dfs[field] = dataframe


谢谢@datanovel!这给了我所需要的。很高兴能帮助@HanThanks@datanovel先生!这就是我所需要的。很高兴帮助@Han先生