如何正确检测R数据帧中的连续事件变化

如何正确检测R数据帧中的连续事件变化,r,R,我是R的初学者。我发现所有这些都是分析数据的绝佳功能。我想通过检测事件变化来过滤数据帧。例如,如果我们采用以下数据: testcase date event 1 TESTCASE1 2013-06-12 18:12:09 EVENT1 2 TESTCASE1 2013-06-12 18:12:12 EVENT1 3 TESTCASE1 2013-06-12 18:12:15 EVENT2 4 TESTCASE1 2013-06-12 18:12:16 EV

我是R的初学者。我发现所有这些都是分析数据的绝佳功能。我想通过检测事件变化来过滤数据帧。例如,如果我们采用以下数据:

testcase                date  event
1  TESTCASE1 2013-06-12 18:12:09 EVENT1
2  TESTCASE1 2013-06-12 18:12:12 EVENT1
3  TESTCASE1 2013-06-12 18:12:15 EVENT2
4  TESTCASE1 2013-06-12 18:12:16 EVENT2
5  TESTCASE1 2013-06-12 18:12:25 EVENT1
6  TESTCASE2 2013-06-12 18:12:10 EVENT4
7  TESTCASE2 2013-06-12 18:12:16 EVENT4
8  TESTCASE2 2013-06-12 18:12:17 EVENT2
9  TESTCASE2 2013-06-12 18:12:26 EVENT2
10 TESTCASE2 2013-06-12 18:12:30 EVENT1
我只想返回发生事件更改的行。对于本例,它给出了以下内容:

    testcase                date  event
2  TESTCASE1 2013-06-12 18:12:12 EVENT1
3  TESTCASE1 2013-06-12 18:12:15 EVENT2
4  TESTCASE1 2013-06-12 18:12:16 EVENT2
5  TESTCASE1 2013-06-12 18:12:25 EVENT1
7  TESTCASE2 2013-06-12 18:12:16 EVENT4
8  TESTCASE2 2013-06-12 18:12:17 EVENT2
9  TESTCASE2 2013-06-12 18:12:26 EVENT2
10 TESTCASE2 2013-06-12 18:12:30 EVENT1
我找到的唯一方法就是使用循环。它给出了以下代码:

result <- data.frame(   testcase = 

c("TESTCASE1","TESTCASE1","TESTCASE1","TESTCASE1","TESTCASE1","TESTCASE2","TESTCASE2","TESTCASE2","TESTCASE2","TESTCASE2"),
            date = c("2013-06-12 18:12:09","2013-06-12 18:12:12","2013-06-12 18:12:15","2013-06-12 18:12:16","2013-06-12 18:12:25","2013-06-12 18:12:10","2013-06-12 18:12:16","2013-06-12 18:12:17","2013-06-12 18:12:26","2013-06-12 18:12:30"),
            event = c("EVENT1","EVENT1","EVENT2","EVENT2","EVENT1","EVENT4","EVENT4","EVENT2","EVENT2", "EVENT1"))

tc <- result[1,"testcase"]

currentDate <- result[1,"date"]
currentEvent <- result[1,"event"]
#index variable de sortieoutput
j <- 1
output <- c()

for(i in 2:length(result[,1])){
    if(tc != result[i,"testcase"]){
        tc <- result[i,"testcase"];
        currentEvent <- result[i,"event"]
    }else{
        #detection de handhover
        if(result[i,"event"] != currentEvent){
            output[j] <- i-1
            output[j+1] <- i
            j <- j+2
            currentEvent <- result[i,"event"]
        }
    }
}

output_data <- result[unique(output),]

result以下是一种矢量化方法:

change.idx <- with(result, which(head(testcase, -1) == tail(testcase, -1) &
                                 head(event,    -1) != tail(event,    -1)))
# [1] 2 4 7 9

keep.idx <- unique(sort(c(change.idx, change.idx + 1)))
# [1]  2  3  4  5  7  8  9 10

result[keep.idx, ]
#     testcase                date  event
# 2  TESTCASE1 2013-06-12 18:12:12 EVENT1
# 3  TESTCASE1 2013-06-12 18:12:15 EVENT2
# 4  TESTCASE1 2013-06-12 18:12:16 EVENT2
# 5  TESTCASE1 2013-06-12 18:12:25 EVENT1
# 7  TESTCASE2 2013-06-12 18:12:16 EVENT4
# 8  TESTCASE2 2013-06-12 18:12:17 EVENT2
# 9  TESTCASE2 2013-06-12 18:12:26 EVENT2
# 10 TESTCASE2 2013-06-12 18:12:30 EVENT1

change.idx下面是另一种使用
diff
的矢量化方法:

differs_from_previous <- c(diff(result$event), 0) != 0 & 
    c(diff(result$testcase), 0) == 0
differs_from_next <- c(0, diff(result$event)) != 0 & 
    c(0, diff(result$testcase)) == 0
result[differs_from_previous | differs_from_next, ]
不同于先前的另一个选项:

f <- function(d) d[with(d, { y <- head(event,-1)!=tail(event,-1); c(FALSE, y) | c(y, FALSE)}),]

Reduce(rbind, by(result, result$testcase, f))

谢谢你的回答,但我不明白Reduce函数在做什么?你能给我解释一下吗?@JérômeB,
Reduce
by
分割的行重新绑定在一起。如果使用package
data.table
,甚至
plyr
,速度可以提高。
    testcase                date  event
2  TESTCASE1 2013-06-12 18:12:12 EVENT1
3  TESTCASE1 2013-06-12 18:12:15 EVENT2
4  TESTCASE1 2013-06-12 18:12:16 EVENT2
5  TESTCASE1 2013-06-12 18:12:25 EVENT1
7  TESTCASE2 2013-06-12 18:12:16 EVENT4
8  TESTCASE2 2013-06-12 18:12:17 EVENT2
9  TESTCASE2 2013-06-12 18:12:26 EVENT2
10 TESTCASE2 2013-06-12 18:12:30 EVENT1