在Python中连接数据帧_Python_Pandas_Join

在Python中连接数据帧

python pandas join

在Python中连接数据帧,python,pandas,join,Python,Pandas,Join,对于Python中的以下数据帧： Yref = pd.read_csv(rootDir + 'data/trailerClassificationData/C'+str(2)+'_withinShotAggr_'+withinShotAggr+'_btwshotAggr_'+btwShotAggr+'.csv',sep=',') Y = pd.read_csv(rootDir + 'data/trailerClassificationData/C'+str(3)+'_withinShotAg

对于Python中的以下数据帧：

 Yref = pd.read_csv(rootDir + 'data/trailerClassificationData/C'+str(2)+'_withinShotAggr_'+withinShotAggr+'_btwshotAggr_'+btwShotAggr+'.csv',sep=',')
 Y = pd.read_csv(rootDir + 'data/trailerClassificationData/C'+str(3)+'_withinShotAggr_'+withinShotAggr+'_btwshotAggr_'+btwShotAggr+'.csv',sep=',')

其中

和

Yref

是一些目标分类输出：

Yref

   movieId Action Comedy Drama Horror
0  93797     1      0     1      0
1  25899     0      1     0      0
2  5673      0      1     1      0
3  86308     0      1     0      0
4  3577      0      0     1      0
5  3575      0      0     1      0
...
7100 rows × 5 columns

对于Y也是如此

Y

   movieId Action Comedy Drama Horror
0  93797     1      0     1      0
1  1222      0      0     1      0
2  5673      0      1     1      0
3  86308     0      1     0      0
4  3577      0      0     1      0
5  3575      0      0     1      0

7136 rows × 5 columns

可以看出，这两个输出的行数不同。因此，第一个问题是如何使用on='movieId'和how='inner'连接两个数据帧？

Yjoin = Yref.join(Y,how='inner',on='movieId')

给了我这个错误<代码>列重叠，但未指定后缀。我通过以下方法解决了第一个问题：

  Yjoin = Yref.merge(Y,on='movieId',how='inner')
  Yjoin = Yres.ix[:,0:5]
  Yjoin.rename(columns={'Action_x':'Action','Comedy_x':'Comedy_x','Drama_x':'Drama','Horror_x':'Horror'}, inplace=True)

完成后，

是一个类似于

的数据帧，具有类似的行，但没有键“movieId”

   test1     test2     test3          test4         test5
0  0.038039  0.212623  4.052835e-02   5.210721e-02  0.004591
1  0.054539  0.257145  0.000000e+00   0.000000e+00  0.115421
2  0.002842  0.209085  1.114923e-02   3.844100e-02  0.024544
3  0.136707  0.377181  0.000000e+00   0.000000e+00  0.055199
....
7136 rows × 5 columns

我需要从

Yjoin

和X中删除删除的行，这样X将具有相同的长度7100*5。在一天结束时，Y和X将有相同数量的行7100

感谢您的评论

很抱歉，合并后您会这样做吗

res=Yref.merge（Y，how='internal'，on='movieId'）

谢谢。与此非常相似，但我不需要合并列侧的两个表，因为它们是相同的。我只想从

中删除

Yref

中不存在的行，给定

movieId

作为键。您的意思是：？您应该说出它给您带来了什么错误。“安”不是很有帮助。谢谢你的评论，我更新了我的问题。我解决了第一个问题，现在我需要回答第二个问题。