Python 在不同长度的多个数据帧中合并特定列
df1Python 在不同长度的多个数据帧中合并特定列,python,pandas,dataframe,merge,multiple-columns,Python,Pandas,Dataframe,Merge,Multiple Columns,df1 Color date 0 A 2011 1 B 201411 2 C 20151231 3 A 2019 df2 Color date 0 A 2013 1 B 20151111 2 C 201101 df3 Color date 0 A 2011 1 B 201411 2 C 201512
Color date
0 A 2011
1 B 201411
2 C 20151231
3 A 2019
df2
Color date
0 A 2013
1 B 20151111
2 C 201101
df3
Color date
0 A 2011
1 B 201411
2 C 20151231
3 A 2019
4 Y 20070212
假设有三个数据帧:
我想通过只提取“日期”列来创建一个新的数据框
输出我想要的内容
Color date datedf2 datedf3
0 A 2011.0 2013.0 2011
1 B 201411.0 20151111.0 201411
2 C 20151231.0 201101.0 20151231
3 A 2019.0 NaN 2019
4 NaN NaN NaN 20070212
新df
df1-date df2-date df3-date
0 2011 2013 2011
1 201411 20151111 201411
2 20151231 201101 20151231
3 2019 NaN 2019
4 NaN NaN 20070212
我想将空部分设置为NaN,因为长度不同
我尝试合并,但出现错误
感谢阅读。这包括两个问题,1多个数据帧
合并
,2重复密钥合并
def multikey(x):
return x.assign(key=x.groupby('Color').cumcount())
#we use groupby and cumcount create the addtional key
from functools import reduce
#then use reduce
df = reduce(lambda left,right:
pd.merge(left,right,on=['Color','key'],how='outer'),
list(map(multikey, [df1,df2,df3])))
df
Color date_x key date_y date
0 A 2011.0 0 2013.0 2011
1 B 201411.0 0 20151111.0 201411
2 C 20151231.0 0 201101.0 20151231
3 A 2019.0 1 NaN 2019
4 Y NaN 0 NaN 20070212
注意这里的名称,我们可以通过rename
方法2从<代码> Currase<代码>不考虑密钥与索引< /P>合并
s=pd.concat([df1,df2,df3],keys=['df1','df2','df3'], axis=1)
s.columns=s.columns.map('_'.join)
s=s.filter(like='_date')
s
df1_date df2_date df3_date
0 2011.0 2013.0 2011
1 201411.0 20151111.0 201411
2 20151231.0 201101.0 20151231
3 2019.0 NaN 2019
4 NaN NaN 20070212
这包括两个问题,1多个数据帧
合并
,2重复密钥合并
def multikey(x):
return x.assign(key=x.groupby('Color').cumcount())
#we use groupby and cumcount create the addtional key
from functools import reduce
#then use reduce
df = reduce(lambda left,right:
pd.merge(left,right,on=['Color','key'],how='outer'),
list(map(multikey, [df1,df2,df3])))
df
Color date_x key date_y date
0 A 2011.0 0 2013.0 2011
1 B 201411.0 0 20151111.0 201411
2 C 20151231.0 0 201101.0 20151231
3 A 2019.0 1 NaN 2019
4 Y NaN 0 NaN 20070212
注意这里的名称,我们可以通过rename
方法2从<代码> Currase<代码>不考虑密钥与索引< /P>合并
s=pd.concat([df1,df2,df3],keys=['df1','df2','df3'], axis=1)
s.columns=s.columns.map('_'.join)
s=s.filter(like='_date')
s
df1_date df2_date df3_date
0 2011.0 2013.0 2011
1 201411.0 20151111.0 201411
2 20151231.0 201101.0 20151231
3 2019.0 NaN 2019
4 NaN NaN 20070212
还有一个办法
df1.join(df2['date'],rsuffix='df2',how='outer').join(df3['date'],rsuffix='df3',how='outer')
输出
Color date datedf2 datedf3
0 A 2011.0 2013.0 2011
1 B 201411.0 20151111.0 201411
2 C 20151231.0 201101.0 20151231
3 A 2019.0 NaN 2019
4 NaN NaN NaN 20070212
还有一个办法
df1.join(df2['date'],rsuffix='df2',how='outer').join(df3['date'],rsuffix='df3',how='outer')
输出
Color date datedf2 datedf3
0 A 2011.0 2013.0 2011
1 B 201411.0 20151111.0 201411
2 C 20151231.0 201101.0 20151231
3 A 2019.0 NaN 2019
4 NaN NaN NaN 20070212
更直观,谢谢更直观,谢谢在方法2中得到一个错误
ValueError:传递值的形状是(405,12),索引暗示(199,12)
。。在我的真实数据中。但是谢谢你的评论!在方法2中得到了一个错误ValueError:传递值的形状是(405,12),索引暗示(199,12)
。。在我的真实数据中。但是谢谢你的评论!