如果数据中的NAs频率高于dplyr中的某个阈值,如何删除组?
我怎样才能从如果数据中的NAs频率高于dplyr中的某个阈值,如何删除组?,r,dplyr,tidyverse,R,Dplyr,Tidyverse,我怎样才能从 # A tibble: 6 x 2 group_var psbl_NAs <chr> <dbl> 1 a 1 2 a NA 3 a NA 4 b 1 5 b 1 6 b NA 我们可以通过,变异,然后筛选,对您进行分组: d %>% group_by
# A tibble: 6 x 2
group_var psbl_NAs
<chr> <dbl>
1 a 1
2 a NA
3 a NA
4 b 1
5 b 1
6 b NA
我们可以通过,
变异
,然后筛选
,对您进行分组:
d %>%
group_by(group_var) %>%
# calculate % of NA values by group
mutate(pct_na = mean(is.na(psbl_NAs))) %>%
# only keep where % of NA values < 0.5
filter(pct_na < 0.5) %>%
select(-pct_na) # remove % NA column
# group_var psbl_NAs
# <chr> <dbl>
# 1 b 1
# 2 b 1
# 3 b NA
tibble(
group_var = c(rep("a",3), rep("b",3)),
psbl_NAs = c(1, NA, NA, 1, 1, NA)
) %>%
group_by(group_var) %>%
??????
d %>%
group_by(group_var) %>%
# calculate % of NA values by group
mutate(pct_na = mean(is.na(psbl_NAs))) %>%
# only keep where % of NA values < 0.5
filter(pct_na < 0.5) %>%
select(-pct_na) # remove % NA column
# group_var psbl_NAs
# <chr> <dbl>
# 1 b 1
# 2 b 1
# 3 b NA
d %>%
group_by(group_var) %>%
# calculate % of NA values by group
mutate(pct_na = mean(is.na(psbl_NAs)))
# group_var psbl_NAs pct_na
# <chr> <dbl> <dbl>
# 1 a 1 0.667
# 2 a NA 0.667
# 3 a NA 0.667
# 4 b 1 0.333
# 5 b 1 0.333
# 6 b NA 0.333
d[with(d, ave(psbl_NAs, group_var, FUN = function(x) mean(is.na(x)))) < 0.5,]