在R中条件匹配时字符变量的运行计数
我有一个dataframe,其中前两列是可以选择的选项,第三列记录了选择。我正在尝试为之前选择的内容与第一列选项匹配时添加运行计数 示例数据帧:在R中条件匹配时字符变量的运行计数,r,dataframe,dplyr,cumulative-sum,R,Dataframe,Dplyr,Cumulative Sum,我有一个dataframe,其中前两列是可以选择的选项,第三列记录了选择。我正在尝试为之前选择的内容与第一列选项匹配时添加运行计数 示例数据帧: df<-data.frame(box.1=c("A","A","B","C","A","B","A"), box.2=c("B","B","A","A","C","C","C"), selection=c("A","B","B","A","C","B","A")) resu
df<-data.frame(box.1=c("A","A","B","C","A","B","A"),
box.2=c("B","B","A","A","C","C","C"),
selection=c("A","B","B","A","C","B","A"))
resulting_df<-data.frame(box.1=c("A","A","B","C","A","B","A"),
box.2=c("B","B","A","A","C","C","C"),
selection=c("A","B","B","A","C","B","A"),
running.count.box.1=c(0,1,0,0,1,1,1))
上述代码不返回实际运行计数,将group_by
更改为selection
或两者的组合也没有达到预期的结果
不建议在中汇总数据,因为数据框将与对其执行类似操作的其他数据框合并,因此应保持相同的框架
是否有办法使用dplyr
在这种情况下添加运行计数
谢谢
编辑:输入错误。库(dplyr)
library(dplyr)
df %>%
group_by(box.a) %>%
mutate(count = pmax(0, lag(cumsum(selection == box.a)), na.rm = TRUE)) %>%
ungroup()
## A tibble: 7 x 4
# box.a box.b selection count
# <fct> <fct> <fct> <dbl>
#1 A B A 0
#2 A B B 1
#3 B A B 0
#4 C A A 0
#5 A C C 1
#6 B C B 1
#7 A C A 1
df%>%
分组依据(方框a)%>%
突变(计数=pmax(0,滞后(总和(选择==box.a)),na.rm=TRUE))%>%
解组()
##一个tibble:7x4
#box.a box.b选择计数
#
#1 A B A 0
#2 A B B 1
#3BA0
#4 C A 0
#5 A C 1
#6b1
#7 A C A 1
我仍然不清楚您的跑步计数列背后的逻辑。你能试着澄清一下如何计算运行计数吗?如果我们按box.a
分组,我希望根据您所描述的内容在您的跑步计数中看到一个值2,但您期望的结果并非如此。逻辑是什么?^,我也不清楚。关于“选择之前与第一列选项匹配的时间”的含义,您能补充更多的解释吗?@Mako212想法是运行计数是box1
中的选项之前被选择的次数。因此,在第一行的框中。1
具有选项A
并且选择了A
,运行计数为0
,因为在第二行框中没有以前的数据。1具有选项A
,而在该行中未选择它,因此在这种情况下,运行计数为1
。将修复框.a
和框.1
打字错误,对此表示抱歉。
library(dplyr)
df %>%
group_by(box.a) %>%
mutate(count = pmax(0, lag(cumsum(selection == box.a)), na.rm = TRUE)) %>%
ungroup()
## A tibble: 7 x 4
# box.a box.b selection count
# <fct> <fct> <fct> <dbl>
#1 A B A 0
#2 A B B 1
#3 B A B 0
#4 C A A 0
#5 A C C 1
#6 B C B 1
#7 A C A 1
transform(df,run = c(0,sapply(2:nrow(df),function(x)box.a[x]%in%box.a[1:(x-1)])))
box.a box.b selection run
1 A B A 0
2 A B B 1
3 B A B 0
4 C A A 0
5 A C C 1
6 B C B 1
7 A C A 1