在Python中使用VLOOKUP和merge_Python_Pandas

在Python中使用VLOOKUP和merge

python pandas

在Python中使用VLOOKUP和merge,python,pandas,Python,Pandas,我有一个熊猫数据框，有将近540000行： df1.head() username hour totalCount 0 lowi 00:00 12 1 klark 00:00 0 2 sturi 00:00 2 3 nukr 00:00 10 4 irore 00:00 2 我还有另一个pandas数据框，它有将近52000行，还有一些重复的行： df2.head() username

我有一个熊猫数据框，有将近540000行：

df1.head()

    username  hour    totalCount
0   lowi      00:00   12
1   klark     00:00   0
2   sturi     00:00   2
3   nukr      00:00   10
4   irore     00:00   2

我还有另一个pandas数据框，它有将近52000行，还有一些重复的行：

df2.head()

   username   community
0    klark       0
1    irore       2
2    sturi       2
3    sturi       2
4    sturi       2

我想将df2的“community”列合并到df1中，但根据用户名在相应的行中。我使用了以下代码：

df_merge = df_hu.merge(df_comm, on='username')
df_merge

但我得到了以下数据帧，其中包含近1205880行和重复行：

    username    hour    totalCount  community
0   lowi        00:00   12          2
1   lowi        00:00   12          2
2   lowi        00:00   12          2
3   lowi        01:00   9           2
4   lowi        01:00   9           2

预期产出如下：

df_merge.head()

    username  hour    totalCount community
0   lowi      00:00   12         2
1   klark     00:00   0          0
2   sturi     00:00   2          2
3   nukr      00:00   10         1 (not showed in the example)
4   irore     00:00   2          1 (not showed in the example)

使用：

输出：

  username   hour  totalCount  community
0     lowi  00:00          12        NaN
1    klark  00:00           0        0.0
2    sturi  00:00           2        2.0
3     nukr  00:00          10        NaN
4    irore  00:00           2        2.0

请注意，

lowi

和

nukr

不在示例

df2

中，因此

NaN

使用：

输出：

  username   hour  totalCount  community
0     lowi  00:00          12        NaN
1    klark  00:00           0        0.0
2    sturi  00:00           2        2.0
3     nukr  00:00          10        NaN
4    irore  00:00           2        2.0

请注意，

lowi

和

nukr

不在示例

df2

中，因此

NaN

假设每个

username

只有一个

community

：

dfu hu.merge（df_comm.drop_duplicates（），on='username'，how='left'））

假设每个

用户名只有一个社区
：df_-hu.merge（df_-comm.drop_-duplicates（），on='username'，how='left'）
我可以知道你为什么不使用merge
而不是map
。因为我认为merge
比map
@MohamedThasinah使用的map
效率高，因为它在我的环境中运行速度大约是merge
的1.5倍。是的，对于这样的用例，map比merge快。：）@MohamedThasinahMay我知道你为什么不使用merge
而不是map
。因为我认为merge
比map
@MohamedThasinah使用的map
效率高，因为它在我的环境中运行速度大约是merge
的1.5倍。是的，对于这样的用例，map比merge快。：）@穆罕默德·哈西纳