Python 使用groupby但不创建系列
我有两个数据帧 培训家庭销售Python 使用groupby但不创建系列,python,pandas,Python,Pandas,我有两个数据帧 培训家庭销售 family store_nbr date unit_sales 0 GROCERY I 1.0 2016-08-01 3.0 1 GROCERY I 1.0 2016-08-02 10.0 2 GROCERY I 1.0 2016-08-04 3.0 3 AUTOMOTIVE 1.0 2016-08-05 5.0 4 AUTOMOTIVE 1.0 2016-08-06 5.0 和火车销售
family store_nbr date unit_sales
0 GROCERY I 1.0 2016-08-01 3.0
1 GROCERY I 1.0 2016-08-02 10.0
2 GROCERY I 1.0 2016-08-04 3.0
3 AUTOMOTIVE 1.0 2016-08-05 5.0
4 AUTOMOTIVE 1.0 2016-08-06 5.0
和火车销售
date store_nbr item_nbr unit_sales family
0 2016-08-01 1.0 103520 3.0 GROCERY I
1 2016-08-02 1.0 103520 1.0 GROCERY I
2 2016-08-04 1.0 103520 6.0 GROCERY I
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE
我想把它们合并到下面的地方
date store_nbr item_nbr unit_sales family f_unit_sales
0 2016-08-01 1.0 103520 3.0 GROCERY I 3.0
1 2016-08-02 1.0 103520 1.0 GROCERY I 10.0
2 2016-08-04 1.0 103520 3.0 GROCERY I 3.0
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE 5.0
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE 6.0
我正在尝试这样做,并执行以下操作:
both_sales = train_sales_with_family.join(train_family_sales,how='left', on=['store_nbr','family','date'], rsuffix='f_')
但是我犯了一个错误。
ValueError:len(left_on)必须等于“right”索引中的级别数
关于如何进行合并有什么建议吗?我认为您需要:
或者为join
添加-需要与上的参数中的列相同级别的多索引
:
both_sales = train_sales.join(train_family_sales.set_index(['store_nbr','family','date']),
on=['store_nbr','family','date'],
rsuffix='_')
也许我们可以在这里使用set_索引和map。这个解决方案看起来更好
both_sales = train_sales.join(train_family_sales.set_index(['store_nbr','family','date']),
on=['store_nbr','family','date'],
rsuffix='_')
print (both_sales)
date store_nbr item_nbr unit_sales family unit_sales_
0 2016-08-01 1.0 103520 3.0 GROCERY I 3.0
1 2016-08-02 1.0 103520 1.0 GROCERY I 10.0
2 2016-08-04 1.0 103520 6.0 GROCERY I 3.0
3 2016-08-05 1.0 103520 2.0 AUTOMOTIVE 5.0
4 2016-08-06 1.0 103520 2.0 AUTOMOTIVE 5.0