Python 从一个字典生成dataframe，该字典的值是不同数量的字典列表_Python_Pandas_Dataframe

Python 从一个字典生成dataframe，该字典的值是不同数量的字典列表

python pandas dataframe

Python 从一个字典生成dataframe，该字典的值是不同数量的字典列表,python,pandas,dataframe,Python,Pandas,Dataframe,我需要将JSON对象解析为数据帧。对象的格式如下所示： {"219": [{"year": "2015", "code": "VU", "category": "Vulnerable"}, {"year": "2008", "code": "VU", "category": "Vulnerable"}, {"year": "2002", "code": "VU", "category": "Vulnerable"}, {"yea

我需要将JSON对象解析为数据帧。对象的格式如下所示：

 {"219": [{"year": "2015", "code": "VU", "category": "Vulnerable"}, 
          {"year": "2008", "code": "VU", "category": "Vulnerable"}, 
          {"year": "2002", "code": "VU", "category": "Vulnerable"}, 
          {"year": "1996", "code": "VU", "category": "Vulnerable"}, 
          {"year": "1994", "code": "V", "category": "Vulnerable"}, 
          {"year": "1990", "code": "V", "category": "Vulnerable"}, 
          {"year": "1988", "code": "V", "category": "Vulnerable"}, 
          {"year": "1986", "code": "V", "category": "Vulnerable"}], 
  "561": [{"year": "2016", "code": "LC", "category": "Least Concern"}, 
          {"year": "2010", "code": "LC", "category": "Least Concern"}, 
          {"year": "2006", "code": "LC", "category": "Least Concern"}, 
          {"year": "1996", "code": "EN", "category": "Endangered"}, 
          {"year": "1994", "code": "R", "category": "Rare"}, 
          {"year": "1990", "code": "R", "category": "Rare"}, 
          {"year": "1988", "code": "R", "category": "Rare"}, 
          {"year": "1986", "code": "R", "category": "Rare"}], 
  "571": [{"year": "2016", "code": "LC", "category": "Least Concern"}, 
          {"year": "2008", "code": "LC", "category": "Least Concern"}, 
          {"year": "2004", "code": "LC", "category": "Least Concern"}, 
          {"year": "1996", "code": "LR/lc", "category": "Lower Risk/least concern"}]
          }

最后，我希望数据帧将键用作行，

year

用作列（每年一列），并将

code

用作值。我不需要

类别

。此外，每个k-v对在值的列表中可以有不同数量的字典（但总是使用相同的

年份

代码

类别

结构）

有没有一种方法可以生成DataFrame，这样我就不必首先将所有年份声明为列？并不是所有年份都在这里表示，如果有代码可以在每次收到JSON对象时创建更新的df，那就太好了

我已经研究了许多SO问题，但到目前为止，没有任何东西能够帮助解决这个问题。

您必须将所有键作为一个数据帧来读取，将它们连接起来，然后在pivot之后创建索引和列：

dict\u to\u load={
"219": [
{“年份”：“2015年”，“代码”：“VU”，“类别”：“弱势群体”}，
{“年份”：“2008年”，“代码”：“VU”，“类别”：“易受伤害”}，
{“年份”：“2002年”，“代码”：“VU”，“类别”：“弱势群体”}，
{“年份”：“1996年”，“代码”：“VU”，“类别”：“易受伤害”}，
{“年份”：“1994年”，“代码”：“V”，“类别”：“易受伤害”}，
{“年份”：“1990年”，“代码”：“V”，“类别”：“易受伤害”}，
{“年份”：“1988年”，“代码”：“V”，“类别”：“易受伤害”}，
{“年份”：“1986年”，“代码”：“V”，“类别”：“易受伤害”}，
],
"561": [
{“年份”：“2016年”，“代码”：“信用证”，“类别”：“最少关注”}，
{“年份”：“2010年”，“代码”：“LC”，“类别”：“最少关注”}，
{“年份”：“2006年”，“代码”：“LC”，“类别”：“最少关注”}，
{“年份”：“1996年”，“代码”：“EN”，“类别”：“濒危”}，
{“年份”：“1994年”，“代码”：“R”，“类别”：“稀有”}，
{“年份”：“1990年”，“代码”：“R”，“类别”：“稀有”}，
{“年份”：“1988年”，“代码”：“R”，“类别”：“稀有”}，
{“年份”：“1986年”，“代码”：“R”，“类别”：“稀有”}，
],
"571": [
{“年份”：“2016年”，“代码”：“信用证”，“类别”：“最少关注”}，
{“年份”：“2008年”，“代码”：“LC”，“类别”：“最少关注”}，
{“年份”：“2004年”，“代码”：“LC”，“类别”：“最少关注”}，
{“年份”：“1996年”，“代码”：“LR/lc”，“类别”：“较低风险/最少关注”}，
],
}
dfs=[]
对于键，dict_to_load.items（）中的值：
df=(
来自dict的pd.数据帧（值）
.reset_索引（drop=True）
.assign（taxonid=lambda x:[key]*len（x））#为未来索引创建列
.drop（['category']，axis='columns'）#drop无用列
)
dfs.append（df）
最终_df=pd.concat（dfs，axis='rows'）.pivot(
索引='taxonid'，列='year'，值='code'
)

我假设字典已经被读取并分配给一个变量（

dict\u to\u load

）

如果

是问题的字典，那么这个例子：

df = pd.DataFrame( ((k, *dd.values()) for k, v in d.items() for dd in v), columns=['taxid', 'year', 'code', 'category'] )
df = pd.pivot_table(df, values='code', index='taxid', columns='year', aggfunc='first')
print(df)

印刷品：

year  1986 1988 1990 1994   1996 2002 2004 2006 2008 2010 2015 2016
taxid                                                              
219      V    V    V    V     VU   VU  NaN  NaN   VU  NaN   VU  NaN
561      R    R    R    R     EN  NaN  NaN   LC  NaN   LC  NaN   LC
571    NaN  NaN  NaN  NaN  LR/lc  NaN   LC  NaN   LC  NaN  NaN   LC

试试这个（f是您的json）：

谢谢你，安德烈，效果很好。两个问题：在第一行中，为什么选择生成器而不是列表？在第二行，

aggfunc

做了什么？@panopticonopolis我使用了生成器，因为创建临时列表然后将其复制到数据帧是浪费。使用生成器，可以跳过此临时列表

aggfunc='first'

是因为我只需要组中的第一个值（并且

'code'

列中的值是字符串类型）。谢谢，但我得到

TypeError:“\u io.TextIOWrapper”对象在我尝试运行它时不可订阅。
year  1986 1988 1990 1994   1996 2002 2004 2006 2008 2010 2015 2016
taxid                                                              
219      V    V    V    V     VU   VU  NaN  NaN   VU  NaN   VU  NaN
561      R    R    R    R     EN  NaN  NaN   LC  NaN   LC  NaN   LC
571    NaN  NaN  NaN  NaN  LR/lc  NaN   LC  NaN   LC  NaN  NaN   LC

df7 = pd.DataFrame()
df7.append([pd.io.json.json_normalize(f[x]).assign(taxonid=x) for x in f.keys()]).drop(columns='category').pivot(index='taxonid', columns='year', values='code')                     


year    1986 1988 1990 1994   1996 2002 2004 2006 2008 2010 2015 2016
taxonid                                                              
219        V    V    V    V     VU   VU  NaN  NaN   VU  NaN   VU  NaN
561        R    R    R    R     EN  NaN  NaN   LC  NaN   LC  NaN   LC
571      NaN  NaN  NaN  NaN  LR/lc  NaN   LC  NaN   LC  NaN  NaN   LC