Python 将两个熊猫按对象分组求和
我有两只熊猫按对象分组,我想把它们的值相加。我不知道如何合并这两个数据帧,以便列Python 将两个熊猫按对象分组求和,python,python-3.x,pandas,indexing,pandas-groupby,Python,Python 3.x,Pandas,Indexing,Pandas Groupby,我有两只熊猫按对象分组,我想把它们的值相加。我不知道如何合并这两个数据帧,以便列CALL_BLOCK拥有该DOW的所有十个调用块,并对值求和。我尝试了几种方法,如重置索引和合并两个数据帧,但仍然无法获得columncall\u blocks的所有十个调用块。我会感谢你的帮助。先谢谢你 已编辑 df1 = {('1-100019B', 'a_8:00AM to 9:00AM'): 0.6493506493506493, ('1-100019B', 'b_9:00AM to 10:00AM'):
CALL_BLOCK
拥有该DOW
的所有十个调用块,并对值求和。我尝试了几种方法,如重置索引和合并两个数据帧,但仍然无法获得columncall\u blocks
的所有十个调用块。我会感谢你的帮助。先谢谢你
已编辑
df1 = {('1-100019B', 'a_8:00AM to 9:00AM'): 0.6493506493506493,
('1-100019B', 'b_9:00AM to 10:00AM'): 0.7272727272727273,
('1-100019B', 'c_10:00AM to 11:00AM'): 0.16883116883116883,
('1-100019B', 'd_11:00AM to 12:00PM'): 0.025974025974025976,
('1-100019B', 'e_12:00PM to 1:00PM'): 0.38961038961038963,
('1-100019B', 'f_1:00PM to 2:00PM'): 0.14285714285714285,
('1-100019B', 'g_2:00PM to 3:00PM'): 0.0,
('1-100019B', 'h_3:00PM to 4:00PM'): 0.12987012987012986,
('1-100019B', 'i_4:00PM to 5:00PM'): 0.0,
('1-100019B', 'j_After 5PM'): 0.0}
df2 =
{('1-100019B', 0, 'a_8:00AM to 9:00AM'): 0.5,
('1-100019B', 0, 'b_9:00AM to 10:00AM'): 0.6666666666666666,
('1-100019B', 0, 'c_10:00AM to 11:00AM'): 0.25,
('1-100019B', 0, 'e_12:00PM to 1:00PM'): 0.3333333333333333,
('1-100019B', 0, 'f_1:00PM to 2:00PM'): 0.0,
('1-100019B', 0, 'h_3:00PM to 4:00PM'): 1.0}
预期输出:
df =
CONTACT_ID DOW CALL_BLOCKS
1-100019B 0 a_8:00AM to 9:00AM 1.149
b_9:00AM to 10:00AM 1.380
c_10:00AM to 11:00AM 0.410
d_11:00AM to 12:00PM 0.026
e_12:00PM to 1:00PM 0.710
f_1:00PM to 2:00PM 0.140
g_2:00PM to 3:00PM 0.000
h_3:00PM to 4:00PM 1.120
i_4:00PM to 5:00PM 0.000
j_After 5PM 0.000
从第二个数据帧中删除未使用的
多索引
级别,然后使用pd.Series.add
:
df2.index = df2.index.droplevel(1)
res = df1.add(df2, fill_value=0)
print(res)
0
idx1 idx3
1-100019B a_8:00AM to 9:00AM 1.149351
b_9:00AM to 10:00AM 1.393939
c_10:00AM to 11:00AM 0.418831
d_11:00AM to 12:00PM 0.025974
e_12:00PM to 1:00PM 0.722944
f_1:00PM to 2:00PM 0.142857
g_2:00PM to 3:00PM 0.000000
h_3:00PM to 4:00PM 1.129870
i_4:00PM to 5:00PM 0.000000
j_After 5PM 0.000000
设置
这是我用来从您的输入词典中获取到multi-index
系列的代码,您可以将其视为groupby
操作的输出
df1 = pd.DataFrame.from_dict(df1, orient='index').reset_index()
df1 = df1.join(pd.DataFrame(df1['index'].values.tolist(), columns=['idx1', 'idx3'])).drop('index', 1)
df1 = df1.set_index(['idx1', 'idx3'])
df2 = pd.DataFrame.from_dict(df2, orient='index').reset_index()
df2 = df2.join(pd.DataFrame(df2['index'].values.tolist(), columns=['idx1', 'idx2', 'idx3'])).drop('index', 1)
df2 = df2.set_index(['idx1', 'idx2', 'idx3'])
使用@jpp setup
df1.merge(df2.reset_index('DOW'), on=['CONTACTS_ID','CALL_BLOCKS'], how='outer')\
.set_index('DOW', append=True).sum(1)
输出:
CONTACTS_ID CALL_BLOCKS DOW
1-100019B a_8:00AM to 9:00AM 0.0 1.149351
b_9:00AM to 10:00AM 0.0 1.393939
c_10:00AM to 11:00AM 0.0 0.418831
d_11:00AM to 12:00PM NaN 0.025974
e_12:00PM to 1:00PM 0.0 0.722944
f_1:00PM to 2:00PM 0.0 0.142857
g_2:00PM to 3:00PM NaN 0.000000
h_3:00PM to 4:00PM 0.0 1.129870
i_4:00PM to 5:00PM NaN 0.000000
j_After 5PM NaN 0.000000
dtype: float64
你能把df1.to_dict()和df2.to_dict()加到这个问题上吗。这有用吗?谢谢你的回答。我不能删除
level=1(DOW)
,因为我希望特定于DOW
列的值与我在预期输出下描述的值类似。使用reset\u index()会更简单吗
在这些分组对象上,将其转换为熊猫数据框并处理掉,在这种情况下,输出将是所述格式的数据框?这很有帮助。谢谢你。@KrishnangKDalal我很高兴这有帮助。不客气。快乐编码!