Regex 基于复杂规则识别data.frames中的行
在前面的两个问题中,我问了如何基于复杂规则识别和提取子字符串:Regex 基于复杂规则识别data.frames中的行,regex,r,dataframe,Regex,R,Dataframe,在前面的两个问题中,我问了如何基于复杂规则识别和提取子字符串: 当前的问题涉及如何在data.frame结构中实现相同的目的。假设您有一个data.frame,如下所示: data.frame(time = seq(1:10), event = c("FA", "EX", "I1", "FA", "FA", "I3", "EX", "EX", "EX", "I3"), actor = c("John", "Alex", "John", "Alex", "Tim", "Sandra",
data.frame
结构中实现相同的目的。假设您有一个data.frame
,如下所示:
data.frame(time = seq(1:10),
event = c("FA", "EX", "I1", "FA", "FA", "I3", "EX", "EX", "EX", "I3"),
actor = c("John", "Alex", "John", "Alex", "Tim", "Sandra", "Sara", "John", "Eliza", "Alex"))
time event actor
1 FA John
2 EX Alex
3 I1 John
4 FA Alex
5 FA Tim
6 I3 Sandra
7 EX Sara
8 EX John
9 EX Eliza
10 I3 Alex
现在我想从时间1移动到10,并将I3前面的所有行分组。这意味着我想返回一个包含两个data.frame的列表(第1-6行和第7-10行应分别形成一个单独的data.frame,并放置在公共列表中)。我如何才能做到这一点?您可以使用
split
:
split(dat, c(0, cumsum(dat$event=="I3"))[-(nrow(dat)+1)])
$`0`
time event actor
1 1 FA John
2 2 EX Alex
3 3 I1 John
4 4 FA Alex
5 5 FA Tim
6 6 I3 Sandra
$`1`
time event actor
7 7 EX Sara
8 8 EX John
9 9 EX Eliza
10 10 I3 Alex
这也行得通:
i3.index = which(data$event == "I3")
i3.start = c(1, i3.index[-length(i3.index)]+1)
indexMatrix = cbind(from = i3.start, end = i3.index)
apply(indexMatrix, 1, function(x){data[x[1]:x[2],]})
# [[1]]
# time event actor
# 1 1 FA John
# 2 2 EX Alex
# 3 3 I1 John
# 4 4 FA Alex
# 5 5 FA Tim
# 6 6 I3 Sandra
#
# [[2]]
# time event actor
# 7 7 EX Sara
# 8 8 EX John
# 9 9 EX Eliza
# 10 10 I3 Alex
这也将有助于:
library(dplyr)
data %>%
arrange(time %>% desc) %>%
mutate(group = cumsum(event == "I3")) %>%
arrange(time) %>%
group_by(group)
@MrFlick:data.frames的列表。谢谢问题已更新。