Python 从另一个数据帧更新数据帧中的特定值_Python_Pandas

Python 从另一个数据帧更新数据帧中的特定值

python pandas

Python 从另一个数据帧更新数据帧中的特定值,python,pandas,Python,Pandas,我有一个由聊天记录组成的数据框： id time author text a1 06:15:19 system aaaaa a1 13:57:50 Agent(Human) ssfsd a1 14:00:05 customer ddg a1 14:06:08 Agent(Human) sdfg a1 14:08:54 customer sdfg a1

我有一个由聊天记录组成的数据框：

id     time        author          text
a1    06:15:19     system        aaaaa
a1    13:57:50     Agent(Human)  ssfsd
a1    14:00:05     customer      ddg
a1    14:06:08     Agent(Human)  sdfg
a1    14:08:54     customer      sdfg
a1    15:58:48     Agent(Human)  jfghdfg
a1    16:18:41     customer      urtr
a1    16:51:38     Agent(Human)  erweg

我还有另一个代理数据框，其中包含他们开始聊天的时间。例如：df2

id    agent_id    agent_time
a1     D01        13:57:50
a1     D02        15:58:48

现在，我希望根据那个特定的时间，用“agent_id”中的值更新“author”列中的值，并用各自的代理名称填充包含“agent（Human）”的author的中间值

所需的最终输出：

id     time        author          text
a1    06:15:19     system        aaaaa
a1    13:57:50     D01           ssfsd
a1    14:00:05     customer      ddg
a1    14:06:08     D01           sdfg
a1    14:08:54     customer      sdfg
a1    15:58:48     D02           jfghdfg
a1    16:18:41     customer      urtr
a1    16:51:38     D02           erweg

我尝试使用.map（）操作来完成它

df1['author'] = df1['time'].map(df2.set_index('agent_time')['agent_id'])

但我得到了一个错误的输出：

id     time        author          text
a1    06:15:19     NaN           aaaaa
a1    13:57:50     D01           ssfsd
a1    14:00:05     NaN           ddg
a1    14:06:08     NaN           sdfg
a1    14:08:54     NaN           sdfg
a1    15:58:48     D02           jfghdfg
a1    16:18:41     NaN           urtr
a1    16:51:38     NaN           erweg

我也尝试过使用.loc方法，但没有成功

有谁能指导我如何达到预期的产出？任何线索都会很有帮助

我认为，在您的解决方案中，应该添加每个

id

的转发缺失值，以及将不匹配的

代理（人员）

修复为

作者的原始值：
m = df1['author'].eq('Agent(Human)')

df1['author'] = (df1['time'].map(df2.set_index('agent_time')['agent_id'])
                            .groupby(df1['id'])
                            .ffill()
                            .where(m, df1['author']))

print (df1)
   id      time    author     text
0  a1  06:15:19    system    aaaaa
1  a1  13:57:50       D01    ssfsd
2  a1  14:00:05  customer      ddg
3  a1  14:06:08       D01     sdfg
4  a1  14:08:54  customer     sdfg
5  a1  15:58:48       D02  jfghdfg
6  a1  16:18:41  customer     urtr
7  a1  16:51:38       D02    erweg

什么是print（df1['time'].dtype，df2['agent\u time'].dtype）
？@jezrael两者都是对象数据类型，stringsexcept用于匹配，所有其他都得到NaN，包括，系统和客户都经过编辑。@jezrael您是最好的：）