在dplyr中应用过滤器时,保持group_by的机智
我试图在一个变种中应用一个过滤器,但我还没有找到正确的方法来应用过滤器,同时保持数据帧分组的灵活性 这里有一个简单的可复制示例:在dplyr中应用过滤器时,保持group_by的机智,r,filter,dplyr,mutate,R,Filter,Dplyr,Mutate,我试图在一个变种中应用一个过滤器,但我还没有找到正确的方法来应用过滤器,同时保持数据帧分组的灵活性 这里有一个简单的可复制示例: # Sample data my_dates = seq(as.Date("2020/1/1"), by = "month", length.out = 6) grp = c(rep("A",3), rep("B", 3)) x = c(2,4,6,8,10,12
# Sample data
my_dates = seq(as.Date("2020/1/1"), by = "month", length.out = 6)
grp = c(rep("A",3), rep("B", 3))
x = c(2,4,6,8,10,12)
my_df <- data.frame(my_dates, grp, x)
my_dates grp x
1 2020-01-01 A 2
2 2020-02-01 A 4
3 2020-03-01 A 6
4 2020-04-01 B 8
5 2020-05-01 B 10
6 2020-06-01 B 12
# Pick a max date for which the data will be filtered
max_date <- "2020-05-01"
# Try to get the average by group, after filtering out the max date included
filt_data <- my_df %>%
group_by(grp) %>%
mutate(included_data = my_dates < max_date,
my_mean = mean(filter(., my_dates < max_date)$x)
)
# A tibble: 6 x 5
# Groups: grp [2]
my_dates grp x included_data my_mean
<date> <fct> <dbl> <lgl> <dbl>
1 2020-01-01 A 2 TRUE 5
2 2020-02-01 A 4 TRUE 5
3 2020-03-01 A 6 TRUE 5
4 2020-04-01 B 8 TRUE 5
5 2020-05-01 B 10 FALSE 5
6 2020-06-01 B 12 FALSE 5
my_dates grp x included_data my_mean
<date> <fct> <dbl> <lgl> <dbl>
1 2020-01-01 A 2 TRUE 4
2 2020-02-01 A 4 TRUE 4
3 2020-03-01 A 6 TRUE 4
4 2020-04-01 B 8 TRUE 8
5 2020-05-01 B 10 FALSE 8
6 2020-06-01 B 12 FALSE 8
#示例数据
my_dates=序号(截止日期(“2020/1/1”),by=“month”,length.out=6)
grp=c(代表(“A”,3),代表(“B”,3))
x=c(2,4,6,8,10,12)
my_df在这里,最好使用“included_data”中的索引对“x”列进行子集化,而不是执行另一个筛选
library(dplyr)
my_df %>%
group_by(grp) %>%
mutate(included_data = my_dates < max_date,
my_mean = mean(x[included_data])) %>%
ungroup
啊,谢谢!一个小注释,你可以删除你的一个“平均值”:my_mean=mean(x[包含的数据])
# A tibble: 6 x 5
# my_dates grp x included_data my_mean
# <date> <chr> <dbl> <lgl> <dbl>
#1 2020-01-01 A 2 TRUE 4
#2 2020-02-01 A 4 TRUE 4
#3 2020-03-01 A 6 TRUE 4
#4 2020-04-01 B 8 TRUE 8
#5 2020-05-01 B 10 FALSE 8
#6 2020-06-01 B 12 FALSE 8
my_df %>%
group_by(grp) %>%
mutate(included_data = my_dates < max_date,
my_mean = mean(filter(cur_data(), my_dates < max_date)$x)) %>%
ungroup
# A tibble: 6 x 5
# my_dates grp x included_data my_mean
# <date> <chr> <dbl> <lgl> <dbl>
#1 2020-01-01 A 2 TRUE 4
#2 2020-02-01 A 4 TRUE 4
#3 2020-03-01 A 6 TRUE 4
#4 2020-04-01 B 8 TRUE 8
#5 2020-05-01 B 10 FALSE 8
#6 2020-06-01 B 12 FALSE 8