Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/310.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用包含2个数据帧和日期范围的IP列用来自df2的数据填充df1数据帧_Python_Pandas - Fatal编程技术网

Python 使用包含2个数据帧和日期范围的IP列用来自df2的数据填充df1数据帧

Python 使用包含2个数据帧和日期范围的IP列用来自df2的数据填充df1数据帧,python,pandas,Python,Pandas,我正在使用2个数据帧。第一种是信息不完整。第二个数据帧具有时间范围为首次看到和最后看到的信息。我试图使用df2中的源地址和时间范围来填充sourcehostname和sourceusername,其中来自df1的datetime属于该时间范围 df1 sourceaddress sourcehostname sourceusername endtime datetime 0 10.0.0.59 computer1 NaN

我正在使用2个数据帧。第一种是信息不完整。第二个数据帧具有时间范围为首次看到和最后看到的信息。我试图使用df2中的源地址和时间范围来填充sourcehostname和sourceusername,其中来自df1的datetime属于该时间范围

df1
        sourceaddress   sourcehostname  sourceusername  endtime         datetime
0       10.0.0.59       computer1       NaN             1564666638000   2019-08-01 09:37:18
1       10.0.0.59       NaN             NaN             1564666640000   2019-08-01 09:37:20
2       10.0.0.59       NaN             NaN             1564666642000   2019-08-01 09:37:22
3       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
4       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
5       10.0.0.59       NaN             NaN             1564666645000   2019-08-01 09:37:25
6       10.0.0.59       computer1       NaN             1564666646000   2019-08-01 09:37:26
7       10.0.0.59       NaN             NaN             1564666646000   2019-08-01 09:37:26
8       10.0.0.59       computer1       NaN             1564666649000   2019-08-01 09:37:29
9       10.0.0.59       computer1       NaN             1564666650000   2019-08-01 09:37:30
10      10.0.0.59       NaN             NaN             1564666850000   2019-08-01 09:40:50
...
43196   10.0.0.187      computer2       NaN             1564718395000   2019-08-01 23:59:55
43197   10.0.0.187      computer2       user1           1564718397000   2019-08-01 23:59:57
43198   10.0.0.187      computer2       NaN             1564718397000   2019-08-01 23:59:57
43199   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
43200   10.0.0.187      NaN             NaN             1564718398000   2019-08-01 23:59:58
43201   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58

df2
        sourceaddress   sourcehostname  sourceusername  firstseen             lastseen
0       10.0.0.59       computer1       user1           2019-08-01 09:37:59   2019-08-01 09:46:08
1       10.0.0.187      computer2       user1           2019-08-01 00:00:03   2019-08-01 23:59:58
预期结果:

df3
        sourceaddress   sourcehostname  sourceusername  endtime         datetime
0       10.0.0.59       computer1       NaN             1564666638000   2019-08-01 09:37:18
1       10.0.0.59       NaN             NaN             1564666640000   2019-08-01 09:37:20
2       10.0.0.59       NaN             NaN             1564666642000   2019-08-01 09:37:22
3       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
4       10.0.0.59       NaN             NaN             1564666643000   2019-08-01 09:37:23
5       10.0.0.59       NaN             NaN             1564666645000   2019-08-01 09:37:25
6       10.0.0.59       computer1       NaN             1564666646000   2019-08-01 09:37:26
7       10.0.0.59       NaN             NaN             1564666646000   2019-08-01 09:37:26
8       10.0.0.59       computer1       NaN             1564666649000   2019-08-01 09:37:29
9       10.0.0.59       computer1       NaN             1564666650000   2019-08-01 09:37:30
10      10.0.0.59       computer1       user1           1564668650000   2019-08-01 10:10:50
...
43196   10.0.0.187      computer2       user1           1564718395000   2019-08-01 23:59:55
43197   10.0.0.187      computer2       user1           1564718397000   2019-08-01 23:59:57
43198   10.0.0.187      computer2       user1           1564718397000   2019-08-01 23:59:57
43199   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
43200   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
43201   10.0.0.187      computer2       user1           1564718398000   2019-08-01 23:59:58
**下面是一个例子:

df3[-5:]
        sourceaddress   sourcehostname  sourceusername  endtime          datetime               firstseen              lastseen
43197   10.99.0.187     computer2       user1           1564718397000    2019-08-01 23:59:57    2019-08-01 00:00:03    2019-08-01 23:59:58
43198   10.99.0.187     computer2       NaN             1564718397000    2019-08-01 23:59:57    2019-08-01 00:00:03    2019-08-01 23:59:58
43199   10.99.0.187     computer2       NaN             1564718398000    2019-08-01 23:59:58    2019-08-01 00:00:03    2019-08-01 23:59:58
43200   10.99.0.187     computer2       user1           1564718398000    2019-08-01 23:59:58    2019-08-01 00:00:03    2019-08-01 23:59:58
43201   10.99.0.187     computer2       user1           1564718398000    2019-08-01 23:59:58    2019-08-01 00:00:03    2019-08-01 23:59:58

这看起来像是一个
合并
问题:

df3 = df1.merge(df2,
                on='sourceaddress', how='left',
                suffixes=['','_df2']
               )
# mark the valid time:
mask = df3['datetime'].ge(df3['firstseen']) & df3['datetime'].lt(df3['lastseen'])

# update the info
df3.loc[mask, 'sourcehostname'] = df3.loc[mask, 'sourcehostname_df2']
df3.loc[mask, 'sourceusername'] = df3.loc[mask, 'sourceusername_df2']

然后你可以删除
sourcehostname\u df2
sourceusername\u df2

你的
df1
df2
有多长?df1大约有8000万行,有几千个用户和计算机。df2大约有几千个。在您的示例中,
df2.sourceaddress
df1.sourceaddress
不匹配,这是故意的吗?很抱歉造成混淆。我修复了它@QuangHoangI按照你的例子,并将结果发布到我问题的末尾。之后,我甚至用…\u df2放弃了两个字段。我仍然没有将用户名设置为列中的user1。它应该在那里,因为它属于时间范围。我可以看到一行来自
lt(df3['lastseen'])
,将其更改为
le(df3['lastseen'])
。是的,将其从小于更改为小于或等于。谢谢你帮助我@Quang Hoang