Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 计算字母表在特定字母之前出现的次数_R_Aggregate - Fatal编程技术网

R 计算字母表在特定字母之前出现的次数

R 计算字母表在特定字母之前出现的次数,r,aggregate,R,Aggregate,我想在每个Id级别出现第一个“C”之前计算“I”的出现次数。我尝试过这段代码,但可以计算列中的所有“I”。 我试过的代码 library(plyr) Impres = ddply(df, .(Id), summarize, No_of_I_before_First_C = length(which(Character == "I"))) 样本数据 Id Character 1 I 1 I 1 C 1 I 2 I 2 C 输出应该是这样的 Id

我想在每个Id级别出现第一个“C”之前计算“I”的出现次数。我尝试过这段代码,但可以计算列中的所有“I”。 我试过的代码

library(plyr)
Impres = ddply(df, .(Id), summarize, No_of_I_before_First_C = length(which(Character == "I")))
样本数据

Id  Character
1     I
1     I
1     C
1     I
2     I
2     C
输出应该是这样的

Id  Count_Of_I_before_First_C
1     2
2     1
这里有一个想法

first1 <- function(x, letter){
           which(x == letter)[1]-1
           }

aggregate(Character ~ Id, df, first1, 'C')
#  Id Character
#1  1         2
#2  2         1
first1
结果:

# A tibble: 2 × 2
     Id Count_Of_I_before_First_C
  <dbl>                     <int>
1     1                         2
2     2                         1
#一个tible:2×2
在第一次之前的Id计数
1     1                         2
2     2                         1

以下是
数据。表
解决方案:

library(data.table)
dt <- data.table(Id = c(1,1,1,1,2,2), Character = c('I', 'I', 'C', 'I', 'I', 'C'))
dt[, cnt.c := cumsum(Character == "C"), by = Id]
res <- dt[cnt.c == 0, .(Count_Of_I_before_First_C = length(Character)), by = Id]
库(data.table)
dt也许:

library(dplyr)

rlei <- function(x) {
  r <- rle(x)
  I <- which(r$values=="I")
  C <- which(r$values=="C")
  r$lengths[which(I<C)][1]
}

group_by(df, Id) %>% 
  summarise(Count_Of_I_before_First_C=rlei(.$Character))
库(dplyr)

rlei这将是相当大的缓慢dataset@Bulat我只是跟着问题的
agregate
标签走(即没有包裹)。我知道
dplyr
data.table
都有更有效的方法
df %>% 
  group_by(Id) %>% 
  summarise(Count_Of_I_before_First_C = foo(Character))
# A tibble: 2 × 2
     Id Count_Of_I_before_First_C
  <dbl>                     <int>
1     1                         2
2     2                         1
library(data.table)
dt <- data.table(Id = c(1,1,1,1,2,2), Character = c('I', 'I', 'C', 'I', 'I', 'C'))
dt[, cnt.c := cumsum(Character == "C"), by = Id]
res <- dt[cnt.c == 0, .(Count_Of_I_before_First_C = length(Character)), by = Id]
library(dplyr)

rlei <- function(x) {
  r <- rle(x)
  I <- which(r$values=="I")
  C <- which(r$values=="C")
  r$lengths[which(I<C)][1]
}

group_by(df, Id) %>% 
  summarise(Count_Of_I_before_First_C=rlei(.$Character))