从表中列中的json数据中提取键值计数
我一直在尝试从Pandas中的一列json数据中提取键值值计数,但没有成功。数据格式可在数据框中找到:从表中列中的json数据中提取键值计数,json,pandas,escaping,Json,Pandas,Escaping,我一直在尝试从Pandas中的一列json数据中提取键值值计数,但没有成功。数据格式可在数据框中找到: data = [['ID_1', '{\'RestaurantsTakeOut\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'WiFi\': "u\'no\'", \
data = [['ID_1', '{\'RestaurantsTakeOut\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'WiFi\': "u\'no\'", \'RestaurantsDelivery\': \'False\', \'OutdoorSeating\': \'False\', \'RestaurantsAttire\': "u\'casual\'", \'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsGoodForGroups\': \'True\', \'RestaurantsReservations\': \'False\', \'HasTV\': \'False\', \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'touristy\': False, \'hipster\': False, \'divey\': False, \'classy\': False, \'trendy\': False, \'upscale\': False, \'casual\': False}", \'Alcohol\': "u\'none\'", \'RestaurantsPriceRange2\': \'1\', \'GoodForKids\': \'True\'}'],
['ID_2','{\'RestaurantsTakeOut\': \'True\', \'HasTV\': \'True\', \'NoiseLevel\': "u\'average\'", \'Alcohol\': "u\'full_bar\'", \'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsAttire\': "u\'casual\'", \'Caters\': \'False\', \'RestaurantsDelivery\': \'False\', \'RestaurantsTakeOut\': \'True\', \'Ambience\': "{\'romantic\': False, \'intimate\': True, \'classy\': False, \'hipster\': False, \'divey\': False, \'touristy\': False, \'trendy\': False, \'upscale\': False, \'casual\': False}", \'RestaurantsGoodForGroups\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}", \'GoodForKids\': \'False\', \'RestaurantsPriceRange2\': \'2\', \'WiFi\': "u\'free\'", \'BikeParking\': \'True\', \'RestaurantsReservations\': \'True\'}' ],
['ID_3','{\'RestaurantsTakeOut\': \'False\', \'GoodForKids\': \'True\', \'NoiseLevel\': "u\'average\'", \'RestaurantsPriceRange2\': \'2\', \'BusinessAcceptsCreditCards\': \'True\', \'HasTV\': \'False\', \'OutdoorSeating\': \'False\', \'RestaurantsTakeOut\': \'True\', \'RestaurantsTableService\': \'True\', \'RestaurantsDelivery\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': True, \'valet\': False}", \'RestaurantsReservations\': \'True\', \'BikeParking\': \'True\', \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': True, \'dinner\': True, \'brunch\': False, \'breakfast\': False}", \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'touristy\': False, \'hipster\': False, \'divey\': False, \'classy\': False, \'trendy\': False, \'upscale\': False, \'casual\': True}", \'WiFi\': "u\'no\'", \'Alcohol\': "\'beer_and_wine\'", \'RestaurantsGoodForGroups\': \'True\', \'RestaurantsAttire\': "\'casual\'"}']]
df = pd.DataFrame(data, columns = ['business_id', 'attributes'])
我一直在尝试提取键、值和计数,并将结果以类似于以下的格式显示:
Key1 Value1 Count
Key1 Value2 Count
Key2 Value1 Count
Key2 Value2 Count
Key3 Value1 Count
在此之后,我想选择一些键并将这些键填充为数据框中的新列,其中唯一键的值将填充到列中
business_id atrributes RestaurantsTakeOut
0 ID_1 same as in original dataframe True
1 ID_2 same as in original dataframe True
2 ID_3 same as in original dataframe False
任何关于如何获得这些结果的想法都将不胜感激 IIUC
您只需在ast
模块和pandasjson\u normalize的帮助下卸载json即可
from pandas.io.json import json_normalize
from ast import literal_eval
def unnest_json(dataframe, column):
dataframe_new = json_normalize(dataframe[column].apply(literal_eval))
return dataframe_new
df1 = unnest_json(df,'attributes')
# going a level further
print(unnest_json(df1,'BusinessParking'))
garage street validated lot valet
0 False False False False False
1 False True False False False
2 False False False True False
注意一些json将有NaN
字段,您可以fillna({}')
将它们重新映射为空json字段
通过一个简单的循环,您可以根据密钥创建数据帧字典
json_fields = ['BusinessParking','Ambience','GoodForMeal']
dfs = {}
for field in json_fields:
try:
dataframe = unnest_json(df1,field)
except ValueError:
dataframe = unnest_json(df1.fillna('{}'),field)
dfs[field] = dataframe
IIUC
您只需在ast
模块和pandasjson\u normalize的帮助下卸载json即可
from pandas.io.json import json_normalize
from ast import literal_eval
def unnest_json(dataframe, column):
dataframe_new = json_normalize(dataframe[column].apply(literal_eval))
return dataframe_new
df1 = unnest_json(df,'attributes')
# going a level further
print(unnest_json(df1,'BusinessParking'))
garage street validated lot valet
0 False False False False False
1 False True False False False
2 False False False True False
注意一些json将有NaN
字段,您可以fillna({}')
将它们重新映射为空json字段
通过一个简单的循环,您可以根据密钥创建数据帧字典
json_fields = ['BusinessParking','Ambience','GoodForMeal']
dfs = {}
for field in json_fields:
try:
dataframe = unnest_json(df1,field)
except ValueError:
dataframe = unnest_json(df1.fillna('{}'),field)
dfs[field] = dataframe
谢谢@datanovel!这给了我所需要的。很高兴能帮助@HanThanks@datanovel先生!这就是我所需要的。很高兴帮助@Han先生