Pandas 用多级索引组合双熊猫数据帧
下面是原始数据帧Pandas 用多级索引组合双熊猫数据帧,pandas,Pandas,下面是原始数据帧 Week_No item_Number Inside__Outside 4 1.2014 3164018114707537 INSIDE 6 1.2014 50010EJ654990 INSIDE 19 1.2014 304400JE130142 INSIDE 29 1.2014 3164018114725810 INSIDE 31 1.2014 3164018114711298 INSIDE 35
Week_No item_Number Inside__Outside
4 1.2014 3164018114707537 INSIDE
6 1.2014 50010EJ654990 INSIDE
19 1.2014 304400JE130142 INSIDE
29 1.2014 3164018114725810 INSIDE
31 1.2014 3164018114711298 INSIDE
35 1.2014 3164018114707546 OUTSIDE
36 1.2014 3164018114711299 OUTSIDE
41 1.2014 3164018114727381 INSIDE
54 1.2014 50010EJ655470 OUTSIDE
145 1.2014 304400TS135379 INSIDE
item_Number
count
Week_No Inside__Outside
1.2014 INSIDE 51
OUTSIDE 8
2.2014 INSIDE 91
OUTSIDE 16
3.2014 INSIDE 92
OUTSIDE 7
4.2014 INSIDE 76
OUTSIDE 5
df1
item_Number
count
Week_No Inside__Outside
1.2015 INSIDE 18
2.2015 INSIDE 48
3.2015 INSIDE 87
4.2015 INSIDE 54
5.2015 INSIDE 61
6.2015 INSIDE 46
7.2015 INSIDE 83
8.2015 INSIDE 41
9.2015 INSIDE 34
在这之后,我像这样分组
df = df.groupby(['Week_No','Inside__Outside']).agg(['count'])
之后是一个组合数据帧
Week_No item_Number Inside__Outside
4 1.2014 3164018114707537 INSIDE
6 1.2014 50010EJ654990 INSIDE
19 1.2014 304400JE130142 INSIDE
29 1.2014 3164018114725810 INSIDE
31 1.2014 3164018114711298 INSIDE
35 1.2014 3164018114707546 OUTSIDE
36 1.2014 3164018114711299 OUTSIDE
41 1.2014 3164018114727381 INSIDE
54 1.2014 50010EJ655470 OUTSIDE
145 1.2014 304400TS135379 INSIDE
item_Number
count
Week_No Inside__Outside
1.2014 INSIDE 51
OUTSIDE 8
2.2014 INSIDE 91
OUTSIDE 16
3.2014 INSIDE 92
OUTSIDE 7
4.2014 INSIDE 76
OUTSIDE 5
df1
item_Number
count
Week_No Inside__Outside
1.2015 INSIDE 18
2.2015 INSIDE 48
3.2015 INSIDE 87
4.2015 INSIDE 54
5.2015 INSIDE 61
6.2015 INSIDE 46
7.2015 INSIDE 83
8.2015 INSIDE 41
9.2015 INSIDE 34
现在有两个数据帧
Week_No item_Number Inside__Outside
4 1.2014 3164018114707537 INSIDE
6 1.2014 50010EJ654990 INSIDE
19 1.2014 304400JE130142 INSIDE
29 1.2014 3164018114725810 INSIDE
31 1.2014 3164018114711298 INSIDE
35 1.2014 3164018114707546 OUTSIDE
36 1.2014 3164018114711299 OUTSIDE
41 1.2014 3164018114727381 INSIDE
54 1.2014 50010EJ655470 OUTSIDE
145 1.2014 304400TS135379 INSIDE
item_Number
count
Week_No Inside__Outside
1.2014 INSIDE 51
OUTSIDE 8
2.2014 INSIDE 91
OUTSIDE 16
3.2014 INSIDE 92
OUTSIDE 7
4.2014 INSIDE 76
OUTSIDE 5
df1
item_Number
count
Week_No Inside__Outside
1.2015 INSIDE 18
2.2015 INSIDE 48
3.2015 INSIDE 87
4.2015 INSIDE 54
5.2015 INSIDE 61
6.2015 INSIDE 46
7.2015 INSIDE 83
8.2015 INSIDE 41
9.2015 INSIDE 34
及
现在我想以周为基础求和。i、 e.两个数据帧的输出
Week_No total
1.2015 18
2.2015 48
3.2015 87
4.2015 54
5.2015 61
6.2015 46
7.2015 83
8.2015 41
9.2015 34
我想先选择数据,然后手动添加,但这似乎不太有效。此外,由于这是多级索引,我无法根据周数选择数据。另外,请不要查看计数列中的绝对数。我的问题是针对多级索引数据帧的操作。您必须从索引中删除
Inside\uu Outside
列,因为您不使用它连接两个表
让我们从您在示例中给出的两个数据帧开始:
data_1_df
Out[35]:
item_Number count
Week_No Inside__Outside
1.2015 INSIDE 18
2.2015 INSIDE 48
3.2015 INSIDE 87
4.2015 INSIDE 54
5.2015 INSIDE 61
6.2015 INSIDE 46
7.2015 INSIDE 83
8.2015 INSIDE 41
9.2015 INSIDE 34
及
您可以将它们一个叠在另一个上,在周号
上分组,在项目号计数上求和
:
data_3_df = (
pd.concat([data_1_df, data_2_df])
.reset_index()
.groupby('Week_No')
.agg({'item_Number count': sum}
)
这给出了内部
和外部
每周的总和:
data_3_df
Out[52]:
item_Number count
Week_No
1.2015 26
2.2015 52
3.2015 94
4.2015 58
5.2015 62
6.2015 52
7.2015 91
8.2015 45
9.2015 37
只需将它们附加在一起,并按第一级进行分组-
In [118]: df1
Out[118]:
item_Number
count
Week_No Inside__Outside
1.2015 INSIDE 18
2.2015 INSIDE 48
3.2015 INSIDE 87
4.2015 INSIDE 54
5.2015 INSIDE 61
6.2015 INSIDE 46
7.2015 INSIDE 83
8.2015 INSIDE 41
9.2015 INSIDE 34
In [119]: df2
Out[119]:
item_Number
count
Week_No Inside__Outside
1.2015 OUTSIDE 8
2.2015 OUTSIDE 4
3.2015 OUTSIDE 7
4.2015 OUTSIDE 4
5.2015 OUTSIDE 1
6.2015 OUTSIDE 6
7.2015 OUTSIDE 8
8.2015 OUTSIDE 4
9.2015 OUTSIDE 3
In [120]: df1.append(df2).groupby(level=0).sum()
Out[120]:
item_Number
count
Week_No
1.2015 26
2.2015 52
3.2015 94
4.2015 58
5.2015 62
6.2015 52
7.2015 91
8.2015 45
9.2015 37
你能发布代码和原始输入数据来重现你的dfs吗?另外,你的df没有那么多信息,因为总数与计数是一致的,因为你每周只有1个值,你可以做
df.groupby(level=0).sum()
Hi@EdChum,我已经添加了代码、原始数据帧以及输出。请忽略列中的绝对值,因为这只是一个示例。我想知道在具有多级索引的熊猫数据帧上的操作是如何工作的。我也在讨论中添加了它。您是否可以尝试df1.add(df2,level=0)
尝试这样做,我得到一个错误,即两个多索引对象之间的在level上连接是不明确的
如果将这两个对象相加,则相应的行应该相加。这样,对于输出数据帧中的1.2015
,应该有26个而不是18个。因此,我认为你在问题中给出的结果是不正确的。你能检查一下我下面的答案是否是你想要的吗?