R 根据其他列中的字符串变量计算时间_R_Dataframe_Time

R 根据其他列中的字符串变量计算时间

r dataframe time

R 根据其他列中的字符串变量计算时间,r,dataframe,time,R,Dataframe,Time,我有一个大的数据集，其中包含一个时间列和一个列，用于识别眼跳或眼球注视（眼跳=快速眼球运动，注视=相对稳定的眼球运动）。我想通过计算从第一个“f”开始到第一个“s”的时间，来计算每次注视和扫视持续多长时间。因此，如果有3个带“s”的连续行，我希望它在[I]列中第一个“s”出现的时间和[I]列中最后一个“s”出现在下一个“f”之前的时间。通过这两次分心，我知道每次注视和扫视的持续时间时间刻度不是连续的，因为有时数据闪烁会删除行 example.df <- data.frame(time =

我有一个大的数据集，其中包含一个时间列和一个列，用于识别眼跳或眼球注视（眼跳=快速眼球运动，注视=相对稳定的眼球运动）。我想通过计算从第一个“f”开始到第一个“s”的时间，来计算每次注视和扫视持续多长时间。因此，如果有3个带“s”的连续行，我希望它在[I]列中第一个“s”出现的时间和[I]列中最后一个“s”出现在下一个“f”之前的时间。通过这两次分心，我知道每次注视和扫视的持续时间

时间刻度不是连续的，因为有时数据闪烁会删除行

example.df <- data.frame(time = seq(1:100), 
                         saccade = sample(letters[c(6, 19)], 100, replace = T))

example.df我们可以使用rle（）
创建一个索引，然后groupby（）
将该索引添加到sum（）
时间中：
library(tidyverse)

example.df <- data.frame(time = seq(1:100), 
                         saccade = sample(letters[c(6, 19)], 100, replace = T))

test <- rle(example.df$saccade == "s")

example.df$indexer <- rep(1:length(test$lengths), test$lengths)

example.df <- example.df %>%
  group_by(indexer) %>%
  mutate(period = time[n()] - time[1])

# A tibble: 100 x 4
# Groups:   indexer [53]
    time saccade indexer period
   <int>  <fctr>   <int>  <int>
 1     1       s       1      1
 2     2       s       1      1
 3     3       f       2      0
 4     4       s       3      0
 5     5       f       4      3
 6     6       f       4      3
 7     7       f       4      3
 8     8       f       4      3
 9     9       s       5      1
10    10       s       5      1
# ... with 90 more rows

# drop indexer column
example.df <- example.df[setdiff(names(example.df),"indexer")]

库（tidyverse）
example.df结果为data.frame：
example.df <- data.frame(time = seq(1:100), 
                         saccade = sample(letters[c(6, 19)], 100, replace = T),
                         stringsAsFactors = FALSE)
run_len_encoding <- rle(example.df$saccade)
length_of_runs <- run_len_encoding$length
index_of_changes <- cumsum(length_of_runs)
duration <- diff(c(1,index_of_changes),1)
result.df <- data.frame(duration, state = run_len_encoding$values)
result.df

   duration   state
1         1       s
2         2       f
3         1       s
4         4       f
5         1       s
6         3       f
7         3       s
8         2       f
9         3       s
10        1       f
11        2       s
12        1       f
13        1       s
14        2       f
15        4       s
16        1       f
17        2       s
18        1       f
19        1       s
20        1       f
21        1       s
22        1       f
23        2       s
24        1       f
25        2       s
26        3       f
27        1       s
28        1       f
29        2       s
30        1       f
31        1       s
32        1       f
33        6       s
34        1       f
35        3       s
36        3       f
37        1       s
38        2       f
39        2       s
40        4       f
41        1       s
42        1       f
43        1       s
44        1       f
45        1       s
46        2       f
47        1       s
48        3       f
49        2       s
50        1       f
51        4       s
52        1       f
53        1       s
54        1       f
55        2       s

example.df Hi@LAP，这几乎就成功了。然而，真正的时间线是这样的：275563 275566 275571 275573。我不想知道这些时间的总和，但是从“s”第一次出现到最后一次出现之间的时间。在这个例子中，10毫秒。你知道怎么做吗？@BartR我已经编辑了代码，这是你需要的吗？或者，对于只有一行状态的时间，您是否需要0
以外的任何内容？我得到以下错误：mutate_impl（.data，dots）中的错误：求值错误：argument.data缺失，没有默认值。当运行最后一部分时。当只有一行具有状态时，可以使用0！所以我相信你有我的解决方案哦，我通过复制/粘贴把代码搞砸了。请再试一次。哇，你太棒了。这正是我想要的。这是一个非常有用的社区。向你和克拉兹致敬！！我仍然不完全理解它的工作原理，但我现在会弄清楚。我在示例df的创建中添加了stringsAsFactors=FALSE
，以防止扫视成为一个因素。如果你喜欢扫视作为df中的一个因素，请让我知道，我会在我的回答中用另一种方式来处理。嗨，克拉兹，谢谢你的帮助！不知何故，我在运行rle（example.df$scacade）时出错：“x”必须是原子类型atm的向量。我以前没有。我想要的是脚本在最后一次出现时分散第一次出现的注意力，以计算它的持续时间。试图解决我的另一个问题atmYou必须在我上次编辑re:adding StringsAsAffactors=FALSE之前使用该代码。请再试一次，并让我知道您希望输出的格式，例如持续时间向量或持续时间与s/f相邻的data.frame？好的，这很有帮助。但是，在差异（c（1，i），1）中，您在哪里分配“i”？因为找不到对象i。作为输出，我想听听你最后的建议！再次感谢您的时间！该i
应已被索引更改
，现在已更正。