Python Pandas:将分组的df转换为dict列表,其中两列作为键、值对
我有以下建议:Python Pandas:将分组的df转换为dict列表,其中两列作为键、值对,python,python-2.7,pandas,dictionary,pandas-groupby,Python,Python 2.7,Pandas,Dictionary,Pandas Groupby,我有以下建议: YEAR MONTH VALUE 0 2010 january 1 1 2010 february 0 2 2010 march 2 3 2010 april 1 4 2010 may -2 5 2010 june -0 6 2010 july
YEAR MONTH VALUE
0 2010 january 1
1 2010 february 0
2 2010 march 2
3 2010 april 1
4 2010 may -2
5 2010 june -0
6 2010 july 1
7 2010 august 0
8 2010 september 1
9 2010 october 2
10 2010 november -0
11 2010 december 0
12 2011 january 1
13 2011 february 0
14 2011 march 0
15 2011 april -0
16 2011 may 0
17 2011 june -0
18 2011 july -0
19 2011 august -1
20 2011 september -1
21 2011 october 1
22 2011 november 0
23 2011 december 1
我需要将其转换为以下格式
[{"id":0,"year":2010,"january":1,"february":1,"march":2,"april":1,"may":null,"june":null,"july":null,"august":null,"september":null,"october":null,"november":null,"december":null
基本上,我已经按年份对df进行了分组。现在,我希望每个组都有一个单独的字典,该字典以月份为键,相应的值为值。还有一个额外的键、年份值和组号(id=0)
PS:忽略我所需格式中的空值。它们都应该有相应的月份值我将dict存储在一个列表中,仍然使用
groupby
+for循环
l=[]
count=0
for x ,y in df.groupby('YEAR'):
d=y.set_index('MONTH').VALUE.to_dict()
d['id']=count
d['year']=x
l.append(d)
count=count+1
l
Out[821]:
[{'april': 1.56,
'august': 0.95,
'december': 0.83,
'february': 0.81,
'id': 0,
'january': 1.02,
'july': 1.32,
'june': -0.57,
'march': 2.66,
'may': -2.02,
'november': -0.53,
'october': 2.17,
'september': 1.79,
'year': 2010},
{'april': -0.17,
'august': -1.81,
'december': 1.36,
'february': 0.84,
'id': 1,
'january': 1.06,
'july': -0.04,
'june': -0.27,
'march': 0.11,
'may': 0.15,
'november': 0.75,
'october': 1.95,
'september': -1.55,
'year': 2011}]
您只需调用
dict(df.values)
,就可以从这些值创建一个字典,然后您只需要以正确的方式链接这些组来构建列表
out = []
for idx, (key, group) in enumerate(df.groupby('YEAR')):
year = dict(group.iloc[:, ~group.columns.isin(['YEAR'])].values)
year.update({'id': idx})
out.append(year)
或者作为一个列表
dict_merge = lambda a,b: a.update(b) or a
out = [dict_merge(dict(group.iloc[:, 1:].values), {'id': idx}) for idx, (key, group) in enumerate(groups)]
print(out)
[{'april': 1.56,
'august': 0.95,
'december': 0.83,
'february': 0.81,
'id': 0,
'january': 1.02,
'july': 1.32,
'june': -0.57,
'march': 2.66,
'may': -2.02,
'november': -0.53,
'october': 2.17,
'september': 1.79},
{'april': -0.17,
'august': -1.81,
'december': 1.36,
'february': 0.84,
'id': 1,
'january': 1.06,
'july': -0.04,
'june': -0.27,
'march': 0.11,
'may': 0.15,
'november': 0.75,
'october': 1.95,
'september': -1.55}]
对于O(n)解决方案,可以使用
collections.defaultdict
然后,只需使用{**x,**y}
语法将id
和year
键添加到列表理解中,以组合两个词典
请注意,对字典项使用排序
,可确保结果按年份排序
from collections import defaultdict
d = defaultdict(lambda: defaultdict(int))
for row in df.itertuples():
d[row[1]][row[2]] = row[3]
res = [{**{'id': i, 'year': k}, **v} for i, (k, v) in enumerate(sorted(d.items()))]
结果:
[{'april': 1,
'august': 0,
'december': 0,
'february': 0,
'id': 0,
'january': 1,
'july': 1,
'june': 0,
'march': 2,
'may': -2,
'november': 0,
'october': 2,
'september': 1,
'year': 2010},
{'april': 0,
'august': -1,
'december': 1,
'february': 0,
'id': 1,
'january': 1,
'july': 0,
'june': 0,
'march': 0,
'may': 0,
'november': 0,
'october': 1,
'september': -1,
'year': 2011}]
以下解决方案之一是否有帮助?如果是这样,请随意接受(左边绿色勾选),否则请随意要求澄清。