R 基于真实事件数据生成人工非事件
那里 我的问题是:我想根据真实事件数据生成人工非事件R 基于真实事件数据生成人工非事件,r,dataframe,matrix,dplyr,data.table,R,Dataframe,Matrix,Dplyr,Data.table,那里 我的问题是:我想根据真实事件数据生成人工非事件 > events TIME_ LATITUDE LONGITUDE 1 2013-10-15 00:12:32 9.880 124.1167 2 2013-10-25 17:10:19 37.156 144.6611 3 2014-04-11 07:07:23 -6.586 155.0485 4 2014-04-12 20:14:39 -11.270 162.1481 5
> events
TIME_ LATITUDE LONGITUDE
1 2013-10-15 00:12:32 9.880 124.1167
2 2013-10-25 17:10:19 37.156 144.6611
3 2014-04-11 07:07:23 -6.586 155.0485
4 2014-04-12 20:14:39 -11.270 162.1481
5 2014-04-19 13:28:00 -6.755 155.0241
6 2014-11-15 02:31:41 1.893 126.5217
7 2015-02-27 13:45:05 -7.297 122.5348
8 2015-03-29 23:48:31 -4.729 152.5623
9 2015-05-05 01:44:06 -5.462 151.8751
10 2015-05-07 07:10:19 -7.218 154.5567
11 2015-05-30 11:23:02 27.839 140.4931
12 2015-07-18 02:27:33 -10.401 165.1409
13 2015-07-27 21:41:21 -2.629 138.5277
人工非事件必须满足:
1. Date between 2013/10 and 2015/10, LATITUDE between -26.0 and 43.5 degrees, LONGITUDE between 118.0 and 175.0 degrees.
2. Date cannot set to the value plus or minus 30 days for each real events.
3. LATITUDE cannot set to the value plus or minus 5 degrees for each real events.
4. LONGITUDE cannot set to the value plus or minus 5 degrees for each real events.
我只能用三个循环(日期:2013/10/01:1:2015/10/01;纬度:-26.0:0.1:43.5;经度:118.0:0.1:175.0)来实现它,但效率低下
人工非事件的示例可能如下
TIME_ LATITUDE LONGITUDE
1 2014-10-15 00:12:32 19.8 130.0
那么,你能给出一个有效的解决方案吗
> dput(events)
structure(list(TIME_ = structure(c(1381795952.05, 1382721019.71,
1397200043.13, 1397333679.3, 1397914080.81, 1416018701.72, 1425044705.37,
1427672911.01, 1430790246.38, 1430982619.59, 1432984982.11, 1437186453.82,
1438033281.71), class = c("POSIXct", "POSIXt")), LATITUDE = c(9.88,
37.156, -6.586, -11.27, -6.755, 1.893, -7.297, -4.729, -5.462,
-7.218, 27.839, -10.401, -2.629), LONGITUDE = c(124.1167, 144.6611,
155.0485, 162.1481, 155.0241, 126.5217, 122.5348, 152.5623, 151.8751,
154.5567, 140.4931, 165.1409, 138.5277)), .Names = c("TIME_",
"LATITUDE", "LONGITUDE"), row.names = c(NA, -13L), class = "data.frame")
OP要求在给定的时间、纬度和经度范围内创建一个人工数据集,该数据集必须与实际观测值保持一定距离 其思想是在给定范围内创建随机样本,并移除那些太接近实际观测值的样本 随机样本的创建 移除太接近实际观测值的样本 OP已指定人工事件
dummy
- 日期不能设置为每个实际值加上或减去30天的值 事件李>
- 纬度不能设置为的值为正负5度 每一个真实的事件李>
- 经度不能设置为正负5的值 每个真实事件的学位
library(data.table)
mDT <- setDT(events)[, .(TIM1 = floor_date(TIME_ - days(30), "day"),
TIM2 = ceiling_date(TIME_ + days(30), "day"),
LAT1 = LATITUDE - 5, LAT2 = LATITUDE + 5,
LON1 = LONGITUDE - 5, LON2 = LONGITUDE + 5)]
库(data.table)
mDT时间,
纬度1<纬度,纬度2>纬度,
LON1<经度,LON2>经度),
nomatch=0L,rn,by=.EACHI][,rn]]
变式2
删除独立应用条件的所有行:
setDT(dummy)
dummy[!c(mDT[dummy, on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn],
mDT[dummy, on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn],
mDT[dummy, on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn])]
setDT(虚拟)
虚拟机[!c(mDT[dummy,on=(TIM1
返回一个空表。因此,所有虚拟数据点都违反了一个或另一个条件。即使对于1 M随机采样点,non-fullfilling请求也表明条件可能太苛刻
以下每个表达式都返回要排除的行号:
# rows which violate condition on time / date
mDT[setDT(dummy), on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn]
# rows which violate condition on latitude
mDT[setDT(dummy), on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn]
# rows which violate condition on longitude
mDT[setDT(dummy), on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn]
#违反时间/日期条件的行
mDT[setDT(虚拟),on=(TIM1
行号被合并并从
dummy
中删除。您想在数据集中生成事件或选择事件吗?2-4条件是什么意思?什么是“不能设置”?每个真实事件是什么?你能详细说明吗?@Emmanuel Lin生成事件,thanks@minem生成人工事件,但无法设置…@Pan您可以添加一个简单的示例吗?
setDT(dummy)
dummy[!mDT[dummy, on = .(TIM1 < TIME_, TIM2 > TIME_,
LAT1 < LATITUDE, LAT2 > LATITUDE,
LON1 < LONGITUDE, LON2 > LONGITUDE),
nomatch = 0L, rn, by = .EACHI][, rn]]
setDT(dummy)
dummy[!c(mDT[dummy, on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn],
mDT[dummy, on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn],
mDT[dummy, on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn])]
# rows which violate condition on time / date
mDT[setDT(dummy), on = .(TIM1 < TIME_, TIM2 > TIME_), nomatch = 0L, rn, by = .EACHI][, rn]
# rows which violate condition on latitude
mDT[setDT(dummy), on = .(LAT1 < LATITUDE, LAT2 > LATITUDE), nomatch = 0L, rn, by = .EACHI][, rn]
# rows which violate condition on longitude
mDT[setDT(dummy), on = .(LON1 < LONGITUDE, LON2 > LONGITUDE), nomatch = 0L, rn, by = .EACHI][, rn]