R 使用阈值标识重复数据

R 使用阈值标识重复数据,r,R,我正在处理蓝牙传感器数据,需要识别每个唯一ID的可能重复读数。蓝牙传感器每五秒钟进行一次扫描,如果设备移动不快(例如,在交通中),可能会在后续读数中拾取同一设备。如果同一设备往返,可能会有多个读数,但这些读数应间隔几分钟。我不知道如何消除重复数据。如果macid匹配,我需要计算一个时差列 数据的格式如下: macid time 00:03:7A:4D:F3:59 82333 00:03:7A:EF:58:6F 223556 00:03:7A:EF:58:6F 22360

我正在处理蓝牙传感器数据,需要识别每个唯一ID的可能重复读数。蓝牙传感器每五秒钟进行一次扫描,如果设备移动不快(例如,在交通中),可能会在后续读数中拾取同一设备。如果同一设备往返,可能会有多个读数,但这些读数应间隔几分钟。我不知道如何消除重复数据。如果macid匹配,我需要计算一个时差列

数据的格式如下:

          macid   time
00:03:7A:4D:F3:59  82333
00:03:7A:EF:58:6F 223556
00:03:7A:EF:58:6F 223601
00:03:7A:EF:58:6F 232731
00:03:7A:EF:58:6F 232736
00:05:4F:0B:45:F7 164141
我需要创造:

            macid   time timediff
00:03:7A:4D:F3:59  82333 NA
00:03:7A:EF:58:6F 223556 NA
00:03:7A:EF:58:6F 223601 45
00:03:7A:EF:58:6F 232731 9310
00:03:7A:EF:58:6F 232736 5
00:05:4F:0B:45:F7 164141 NA
我的第一次尝试速度非常慢,而且不太实用:

dedupeIDs <- function (zz) {
  #Order by macid and then time
  zz <- zz[order(zz$macid, zz$time) ,]

  zz$timediff <- c(999999, diff(zz$time))

  for (i in 2:nrow(zz)) {
   if (zz[i, "macid"] == zz[i - 1, "macid"]) {
    print("Different IDs")
   } else {
    zz[i, "timediff"] <- 999999
   }
  }
  return(zz)
}
那么:

x <- structure(list(macid= structure(c(1L, 2L, 2L, 2L, 2L, 3L),
 .Label = c("00:03:7A:4D:F3:59", "00:03:7A:EF:58:6F", "00:05:4F:0B:45:F7"),
 class = "factor"), time = c(82333, 223556, 223601, 232731, 232736, 164141)),
.Names = c("macid", "time"), row.names = c(NA, -6L), class = "data.frame")
# ensure 'x' is ordered properly
x <- x[order(x$macid,x$time),]
# add timediff column by macid
x$timediff <- ave(x$time, x$macid, FUN=function(x) c(NA,diff(x)))

x完美,我忘记了
ave
。我把
rle
的一些东西放在一起,类似于,但这更直接,更切题。非常感谢。
x <- structure(list(macid= structure(c(1L, 2L, 2L, 2L, 2L, 3L),
 .Label = c("00:03:7A:4D:F3:59", "00:03:7A:EF:58:6F", "00:05:4F:0B:45:F7"),
 class = "factor"), time = c(82333, 223556, 223601, 232731, 232736, 164141)),
.Names = c("macid", "time"), row.names = c(NA, -6L), class = "data.frame")
# ensure 'x' is ordered properly
x <- x[order(x$macid,x$time),]
# add timediff column by macid
x$timediff <- ave(x$time, x$macid, FUN=function(x) c(NA,diff(x)))