R 基于真实事件数据生成人工非事件

R 基于真实事件数据生成人工非事件,r,dataframe,matrix,dplyr,data.table,R,Dataframe,Matrix,Dplyr,Data.table,那里 我的问题是:我想根据真实事件数据生成人工非事件 > events TIME_ LATITUDE LONGITUDE 1 2013-10-15 00:12:32 9.880 124.1167 2 2013-10-25 17:10:19 37.156 144.6611 3 2014-04-11 07:07:23 -6.586 155.0485 4 2014-04-12 20:14:39 -11.270 162.1481 5

那里 我的问题是:我想根据真实事件数据生成人工非事件

> events
                 TIME_ LATITUDE LONGITUDE
1  2013-10-15 00:12:32    9.880  124.1167
2  2013-10-25 17:10:19   37.156  144.6611
3  2014-04-11 07:07:23   -6.586  155.0485
4  2014-04-12 20:14:39  -11.270  162.1481
5  2014-04-19 13:28:00   -6.755  155.0241
6  2014-11-15 02:31:41    1.893  126.5217
7  2015-02-27 13:45:05   -7.297  122.5348
8  2015-03-29 23:48:31   -4.729  152.5623
9  2015-05-05 01:44:06   -5.462  151.8751
10 2015-05-07 07:10:19   -7.218  154.5567
11 2015-05-30 11:23:02   27.839  140.4931
12 2015-07-18 02:27:33  -10.401  165.1409
13 2015-07-27 21:41:21   -2.629  138.5277
人工非事件必须满足:

1.  Date between 2013/10 and 2015/10, LATITUDE between -26.0 and 43.5 degrees, LONGITUDE between 118.0 and 175.0 degrees.
2.  Date cannot set to the value plus or minus 30 days for each real events.
3.  LATITUDE cannot set to the value plus or minus 5 degrees for each real events.
4.  LONGITUDE cannot set to the value plus or minus 5 degrees for each real events.
我只能用三个循环(日期:2013/10/01:1:2015/10/01;纬度:-26.0:0.1:43.5;经度:118.0:0.1:175.0)来实现它,但效率低下

人工非事件的示例可能如下

TIME_ LATITUDE LONGITUDE
1  2014-10-15 00:12:32    19.8  130.0
那么,你能给出一个有效的解决方案吗

> dput(events)
structure(list(TIME_ = structure(c(1381795952.05, 1382721019.71, 
1397200043.13, 1397333679.3, 1397914080.81, 1416018701.72, 1425044705.37, 
1427672911.01, 1430790246.38, 1430982619.59, 1432984982.11, 1437186453.82, 
1438033281.71), class = c("POSIXct", "POSIXt")), LATITUDE = c(9.88, 
37.156, -6.586, -11.27, -6.755, 1.893, -7.297, -4.729, -5.462, 
-7.218, 27.839, -10.401, -2.629), LONGITUDE = c(124.1167, 144.6611, 
155.0485, 162.1481, 155.0241, 126.5217, 122.5348, 152.5623, 151.8751, 
154.5567, 140.4931, 165.1409, 138.5277)), .Names = c("TIME_", 
"LATITUDE", "LONGITUDE"), row.names = c(NA, -13L), class = "data.frame")

OP要求在给定的时间、纬度和经度范围内创建一个人工数据集,该数据集必须与实际观测值保持一定距离

其思想是在给定范围内创建随机样本,并移除那些太接近实际观测值的样本

随机样本的创建 移除太接近实际观测值的样本 OP已指定人工事件
dummy

  • 日期不能设置为每个实际值加上或减去30天的值 事件
  • 纬度不能设置为的值为正负5度 每一个真实的事件
  • 经度不能设置为正负5的值 每个真实事件的学位
因此,我们必须删除上述“禁止进入区域”中的任何虚拟数据。然而,这并不完全清楚

  • 排除条件是否必须同时应用,以移除三维时间、纬度和经度中每个真实事件周围球体内的所有虚拟点,或
  • 排除条件必须单独应用 对于删除,非等联接与辅助表一起使用:

    library(data.table)
    mDT <- setDT(events)[, .(TIM1 = floor_date(TIME_ - days(30), "day"), 
                      TIM2 = ceiling_date(TIME_ + days(30), "day"),
                      LAT1 = LATITUDE - 5, LAT2 = LATITUDE + 5,
                      LON1 = LONGITUDE - 5, LON2 = LONGITUDE + 5)]
    
    库(data.table)
    mDT时间,
    纬度1<纬度,纬度2>纬度,
    LON1<经度,LON2>经度),
    nomatch=0L,rn,by=.EACHI][,rn]]
    
    变式2 删除独立应用条件的所有行:

    setDT(dummy)
    dummy[!c(mDT[dummy, on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn],
             mDT[dummy, on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn],
             mDT[dummy, on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn])]
    
    setDT(虚拟)
    虚拟机[!c(mDT[dummy,on=(TIM1TIME),nomatch=0L,rn,by=.EACHI][,rn],
    mDT[dummy,on=(纬度1<纬度,纬度2>纬度),nomatch=0L,rn,by=.EACHI][,rn],
    mDT[dummy,on=(LON1<经度,LON2>经度),nomatch=0L,rn,by=.EACHI][,rn])]
    
    返回一个空表。因此,所有虚拟数据点都违反了一个或另一个条件。即使对于1 M随机采样点,non-fullfilling请求也表明条件可能太苛刻

    以下每个表达式都返回要排除的行号:

    # rows which violate condition on time / date
    mDT[setDT(dummy), on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn]
    
    
    # rows which violate condition on latitude
    mDT[setDT(dummy), on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn]
    
    # rows which violate condition on longitude
    mDT[setDT(dummy), on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn]
    
    #违反时间/日期条件的行
    mDT[setDT(虚拟),on=(TIM1TIME),nomatch=0L,rn,by=.EACHI][,rn]
    #违反纬度条件的行
    mDT[setDT(虚拟),on=(纬度1<纬度,纬度2>纬度),nomatch=0L,rn,by=.EACHI][,rn]
    #违反经度条件的行
    mDT[setDT(虚拟),on=(LON1<经度,LON2>经度),nomatch=0L,rn,by=.EACHI][,rn]
    

    行号被合并并从
    dummy

    中删除。您想在数据集中生成事件或选择事件吗?2-4条件是什么意思?什么是“不能设置”?每个真实事件是什么?你能详细说明吗?@Emmanuel Lin生成事件,thanks@minem生成人工事件,但无法设置…@Pan您可以添加一个简单的示例吗?
    setDT(dummy)
    dummy[!mDT[dummy, on = .(TIM1 < TIME_, TIM2 > TIME_, 
                             LAT1 < LATITUDE, LAT2 > LATITUDE, 
                             LON1 < LONGITUDE, LON2 > LONGITUDE), 
               nomatch = 0L, rn, by = .EACHI][, rn]]
    
    setDT(dummy)
    dummy[!c(mDT[dummy, on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn],
             mDT[dummy, on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn],
             mDT[dummy, on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn])]
    
    # rows which violate condition on time / date
    mDT[setDT(dummy), on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn]
    
    
    # rows which violate condition on latitude
    mDT[setDT(dummy), on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn]
    
    # rows which violate condition on longitude
    mDT[setDT(dummy), on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn]