如何正确检测R数据帧中的连续事件变化
我是R的初学者。我发现所有这些都是分析数据的绝佳功能。我想通过检测事件变化来过滤数据帧。例如,如果我们采用以下数据:如何正确检测R数据帧中的连续事件变化,r,R,我是R的初学者。我发现所有这些都是分析数据的绝佳功能。我想通过检测事件变化来过滤数据帧。例如,如果我们采用以下数据: testcase date event 1 TESTCASE1 2013-06-12 18:12:09 EVENT1 2 TESTCASE1 2013-06-12 18:12:12 EVENT1 3 TESTCASE1 2013-06-12 18:12:15 EVENT2 4 TESTCASE1 2013-06-12 18:12:16 EV
testcase date event
1 TESTCASE1 2013-06-12 18:12:09 EVENT1
2 TESTCASE1 2013-06-12 18:12:12 EVENT1
3 TESTCASE1 2013-06-12 18:12:15 EVENT2
4 TESTCASE1 2013-06-12 18:12:16 EVENT2
5 TESTCASE1 2013-06-12 18:12:25 EVENT1
6 TESTCASE2 2013-06-12 18:12:10 EVENT4
7 TESTCASE2 2013-06-12 18:12:16 EVENT4
8 TESTCASE2 2013-06-12 18:12:17 EVENT2
9 TESTCASE2 2013-06-12 18:12:26 EVENT2
10 TESTCASE2 2013-06-12 18:12:30 EVENT1
我只想返回发生事件更改的行。对于本例,它给出了以下内容:
testcase date event
2 TESTCASE1 2013-06-12 18:12:12 EVENT1
3 TESTCASE1 2013-06-12 18:12:15 EVENT2
4 TESTCASE1 2013-06-12 18:12:16 EVENT2
5 TESTCASE1 2013-06-12 18:12:25 EVENT1
7 TESTCASE2 2013-06-12 18:12:16 EVENT4
8 TESTCASE2 2013-06-12 18:12:17 EVENT2
9 TESTCASE2 2013-06-12 18:12:26 EVENT2
10 TESTCASE2 2013-06-12 18:12:30 EVENT1
我找到的唯一方法就是使用循环。它给出了以下代码:
result <- data.frame( testcase =
c("TESTCASE1","TESTCASE1","TESTCASE1","TESTCASE1","TESTCASE1","TESTCASE2","TESTCASE2","TESTCASE2","TESTCASE2","TESTCASE2"),
date = c("2013-06-12 18:12:09","2013-06-12 18:12:12","2013-06-12 18:12:15","2013-06-12 18:12:16","2013-06-12 18:12:25","2013-06-12 18:12:10","2013-06-12 18:12:16","2013-06-12 18:12:17","2013-06-12 18:12:26","2013-06-12 18:12:30"),
event = c("EVENT1","EVENT1","EVENT2","EVENT2","EVENT1","EVENT4","EVENT4","EVENT2","EVENT2", "EVENT1"))
tc <- result[1,"testcase"]
currentDate <- result[1,"date"]
currentEvent <- result[1,"event"]
#index variable de sortieoutput
j <- 1
output <- c()
for(i in 2:length(result[,1])){
if(tc != result[i,"testcase"]){
tc <- result[i,"testcase"];
currentEvent <- result[i,"event"]
}else{
#detection de handhover
if(result[i,"event"] != currentEvent){
output[j] <- i-1
output[j+1] <- i
j <- j+2
currentEvent <- result[i,"event"]
}
}
}
output_data <- result[unique(output),]
result以下是一种矢量化方法:
change.idx <- with(result, which(head(testcase, -1) == tail(testcase, -1) &
head(event, -1) != tail(event, -1)))
# [1] 2 4 7 9
keep.idx <- unique(sort(c(change.idx, change.idx + 1)))
# [1] 2 3 4 5 7 8 9 10
result[keep.idx, ]
# testcase date event
# 2 TESTCASE1 2013-06-12 18:12:12 EVENT1
# 3 TESTCASE1 2013-06-12 18:12:15 EVENT2
# 4 TESTCASE1 2013-06-12 18:12:16 EVENT2
# 5 TESTCASE1 2013-06-12 18:12:25 EVENT1
# 7 TESTCASE2 2013-06-12 18:12:16 EVENT4
# 8 TESTCASE2 2013-06-12 18:12:17 EVENT2
# 9 TESTCASE2 2013-06-12 18:12:26 EVENT2
# 10 TESTCASE2 2013-06-12 18:12:30 EVENT1
change.idx下面是另一种使用diff
的矢量化方法:
differs_from_previous <- c(diff(result$event), 0) != 0 &
c(diff(result$testcase), 0) == 0
differs_from_next <- c(0, diff(result$event)) != 0 &
c(0, diff(result$testcase)) == 0
result[differs_from_previous | differs_from_next, ]
不同于先前的另一个选项:
f <- function(d) d[with(d, { y <- head(event,-1)!=tail(event,-1); c(FALSE, y) | c(y, FALSE)}),]
Reduce(rbind, by(result, result$testcase, f))
谢谢你的回答,但我不明白Reduce函数在做什么?你能给我解释一下吗?@JérômeB,Reduce
将by
分割的行重新绑定在一起。如果使用packagedata.table
,甚至plyr
,速度可以提高。
testcase date event
2 TESTCASE1 2013-06-12 18:12:12 EVENT1
3 TESTCASE1 2013-06-12 18:12:15 EVENT2
4 TESTCASE1 2013-06-12 18:12:16 EVENT2
5 TESTCASE1 2013-06-12 18:12:25 EVENT1
7 TESTCASE2 2013-06-12 18:12:16 EVENT4
8 TESTCASE2 2013-06-12 18:12:17 EVENT2
9 TESTCASE2 2013-06-12 18:12:26 EVENT2
10 TESTCASE2 2013-06-12 18:12:30 EVENT1