Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jquery/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 基于复杂规则识别data.frames中的行_Regex_R_Dataframe - Fatal编程技术网

Regex 基于复杂规则识别data.frames中的行

Regex 基于复杂规则识别data.frames中的行,regex,r,dataframe,Regex,R,Dataframe,在前面的两个问题中,我问了如何基于复杂规则识别和提取子字符串: 当前的问题涉及如何在data.frame结构中实现相同的目的。假设您有一个data.frame,如下所示: data.frame(time = seq(1:10), event = c("FA", "EX", "I1", "FA", "FA", "I3", "EX", "EX", "EX", "I3"), actor = c("John", "Alex", "John", "Alex", "Tim", "Sandra",

在前面的两个问题中,我问了如何基于复杂规则识别和提取子字符串:

当前的问题涉及如何在
data.frame
结构中实现相同的目的。假设您有一个
data.frame
,如下所示:

data.frame(time = seq(1:10), 
event = c("FA", "EX", "I1", "FA", "FA", "I3", "EX", "EX", "EX", "I3"), 
actor = c("John", "Alex", "John", "Alex", "Tim", "Sandra", "Sara", "John", "Eliza", "Alex"))

time event actor
1    FA    John
2    EX    Alex
3    I1    John
4    FA    Alex
5    FA    Tim
6    I3    Sandra
7    EX    Sara
8    EX    John
9    EX    Eliza
10   I3    Alex

现在我想从时间1移动到10,并将I3前面的所有行分组。这意味着我想返回一个包含两个data.frame的列表(第1-6行和第7-10行应分别形成一个单独的data.frame,并放置在公共列表中)。我如何才能做到这一点?

您可以使用
split

split(dat, c(0, cumsum(dat$event=="I3"))[-(nrow(dat)+1)])

$`0`
  time event  actor
1    1    FA   John
2    2    EX   Alex
3    3    I1   John
4    4    FA   Alex
5    5    FA    Tim
6    6    I3 Sandra

$`1`
   time event actor
7     7    EX  Sara
8     8    EX  John
9     9    EX Eliza
10   10    I3  Alex
这也行得通:

i3.index = which(data$event == "I3")
i3.start = c(1, i3.index[-length(i3.index)]+1)

indexMatrix = cbind(from = i3.start, end = i3.index)

apply(indexMatrix, 1, function(x){data[x[1]:x[2],]})

# [[1]]
# time event  actor
# 1    1    FA   John
# 2    2    EX   Alex
# 3    3    I1   John
# 4    4    FA   Alex
# 5    5    FA    Tim
# 6    6    I3 Sandra
# 
# [[2]]
# time event actor
# 7     7    EX  Sara
# 8     8    EX  John
# 9     9    EX Eliza
# 10   10    I3  Alex
这也将有助于:

library(dplyr)

data %>%
  arrange(time %>% desc) %>%
  mutate(group = cumsum(event == "I3")) %>%
  arrange(time) %>%
  group_by(group)

@MrFlick:data.frames的列表。谢谢问题已更新。