Python 使用pandas连接两个数据帧_Python_Pandas_Sum_Dataframe

Python 使用pandas连接两个数据帧

python pandas dataframe

Python 使用pandas连接两个数据帧,python,pandas,sum,dataframe,Python,Pandas,Sum,Dataframe,我有两个熊猫数据帧，第一个具有以下结构： df1 : id | age | sexe | language | country | ----|-----|------|----------|---------| 1 | 35 | M | FR | FR | 2 | 20 | F | EN | EN | 3 | 60 | M | IT | IT | 第二个具有以下结构： df2 :

我有两个熊猫数据帧，第一个具有以下结构：

df1 : 

 id | age | sexe | language | country |
----|-----|------|----------|---------|
 1  | 35  | M    | FR       | FR      |
 2  | 20  | F    | EN       | EN      |
 3  | 60  | M    | IT       | IT      |

第二个具有以下结构：

df2 : 

 id | act| secs  | 
----|----|-------|
 1  | A  | 5     | 
 1  | B  | 10    | 
 1  | C  | 35    | 
 2  | A  | 1     | 
 2  | B  | 10    | 
 2  | C  | 100   | 
 2  | D  | 50    |
 3  | A  | 20    |
 3  | B  | 25    |
 3  | D  | 10    |

我想为每个使用

id

的用户求和

secs

，我想得到以下数据帧：

 id | age | sexe | language | country |secs |
----|-----|------|----------|---------|-----|     
 1  | 35  | M    | FR       | FR      | 50  |
 2  | 20  | F    | EN       | EN      | 161 |
 3  | 60  | M    | IT       | IT      | 55  |

IIUC可用于

df2

和

sum

的“秒”列，然后用于原始数据帧：

df3 = df2.groupby('id')['secs'].sum()
df4 = pd.concat([df1.set_index('id'), df3], axis=1).reset_index()


In [120]: df4
Out[120]:
   id  age sexe language country  secs
0   1   35    M       FR      FR    50
1   2   20    F       EN      EN   161
2   3   60    M       IT      IT    55

一行：

pd.concat([df1.set_index('id'), df2.groupby('id')['secs'].sum()], axis=1).reset_index()

定时：

In [122]: %timeit pd.concat([df1.set_index('id'), df2.groupby('id')['secs'].sum()], axis=1).reset_index()
100 loops, best of 3: 2.73 ms per loop

In [123]: %timeit pd.merge(df1, df2.groupby('id')['secs'].sum().reset_index(), on=['id'])
100 loops, best of 3: 3.44 ms per loop

In [124]: %timeit pd.merge(df1, df2.groupby('id', as_index=False)['secs'].sum(), on=['id'])
100 loops, best of 3: 3.73 ms per loop

In [125]: %timeit df1.set_index('id').join(df2.groupby('id')['secs'].sum()).reset_index()
100 loops, best of 3: 2.88 ms per loop

IIUC可用于

df2

和

sum

的“秒”列，然后用于原始数据帧：

df3 = df2.groupby('id')['secs'].sum()
df4 = pd.concat([df1.set_index('id'), df3], axis=1).reset_index()


In [120]: df4
Out[120]:
   id  age sexe language country  secs
0   1   35    M       FR      FR    50
1   2   20    F       EN      EN   161
2   3   60    M       IT      IT    55

一行：

pd.concat([df1.set_index('id'), df2.groupby('id')['secs'].sum()], axis=1).reset_index()

定时：

In [122]: %timeit pd.concat([df1.set_index('id'), df2.groupby('id')['secs'].sum()], axis=1).reset_index()
100 loops, best of 3: 2.73 ms per loop

In [123]: %timeit pd.merge(df1, df2.groupby('id')['secs'].sum().reset_index(), on=['id'])
100 loops, best of 3: 3.44 ms per loop

In [124]: %timeit pd.merge(df1, df2.groupby('id', as_index=False)['secs'].sum(), on=['id'])
100 loops, best of 3: 3.73 ms per loop

In [125]: %timeit df1.set_index('id').join(df2.groupby('id')['secs'].sum()).reset_index()
100 loops, best of 3: 2.88 ms per loop

您可以使用和在

df2

上尝试

df1

：

print df2.groupby('id')['secs'].sum().reset_index()
   id  secs
0   1    50
1   2   161
2   3    55

print pd.merge(df1, df2.groupby('id')['secs'].sum().reset_index(), on=['id'])
   id  age sexe language country  secs
0   1   35    M       FR      FR    50
1   2   20    F       EN      EN   161
2   3   60    M       IT      IT    55

或者使用参数

作为_index=False

在：

或者您可以使用：

您可以使用和在

df2

上尝试

df1

：

print df2.groupby('id')['secs'].sum().reset_index()
   id  secs
0   1    50
1   2   161
2   3    55

print pd.merge(df1, df2.groupby('id')['secs'].sum().reset_index(), on=['id'])
   id  age sexe language country  secs
0   1   35    M       FR      FR    50
1   2   20    F       EN      EN   161
2   3   60    M       IT      IT    55

或者使用参数

作为_index=False

在：

或者您可以使用：

你能用

join

解决方案添加测试吗？@jezrael添加。几乎与使用

pd.concat时的速度相同，但不完全相同：）您可以使用join
解决方案添加测试吗？@jezrael添加。几乎与pd.concat的速度相同，但不完全相同：）如果提供的任何答案解决了您的问题，请接受它以结束问题！如果提供的任何答案解决了您的问题，请接受它以结束问题！