按组dplyr重复该值
我想重复每个组(年)内的值,该值等于第一类“A”的值 比如说。我的数据框是:按组dplyr重复该值,r,dplyr,R,Dplyr,我想重复每个组(年)内的值,该值等于第一类“A”的值 比如说。我的数据框是: data = expand.grid( category = LETTERS[1:3], year = 2000:2005) data$value = runif(nrow(data)) 我尝试执行以下操作,但是,它不会重复该值三次 test<-data %>% group_by(year) %>% mutate(value2 =value[category == "A"]) test # A ti
data = expand.grid(
category = LETTERS[1:3],
year = 2000:2005)
data$value = runif(nrow(data))
我尝试执行以下操作,但是,它不会重复该值三次
test<-data %>% group_by(year) %>% mutate(value2 =value[category == "A"])
test
# A tibble: 18 x 4
# Groups: year [6]
category year value value2
<fct> <int> <dbl> <dbl>
1 A 2000 0.783 0.783
2 B 2000 0.351 0.467
3 C 2000 0.296 0.895
4 A 2001 0.467 0.102
5 B 2001 0.168 0.546
6 C 2001 0.459 0.447
7 A 2002 0.895 0.783
编辑:在可能与包冲突相关的注释之后,我添加我之前加载的包列表:
# install packages if not installed already
list.of.packages <- c("stringr", "timeDate", "bizdays",
"lubridate", "readxl", "dplyr","plyr",
"rootSolve", "RODBC", "glue",
"ggplot2","gridExtra","bdscale", "gtools", "scales", "shiny", "leaflet", "data.table", "plotly")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
#========== Libraries to be loaded ===============
lapply(list.of.packages, require, character.only = TRUE)
#------
#如果尚未安装,请安装软件包
包裹清单这里有点奇怪
> data %>% group_by(year) %>%
+ mutate(value_tmp = if_else(category == "A", value, NA_real_),
+ value2 = mean(value_tmp, na.rm = TRUE))
# A tibble: 18 x 5
# Groups: year [6]
category year value value_tmp value2
<fct> <int> <dbl> <dbl> <dbl>
1 A 2000 0.01818495 0.01818495 0.01818495
2 B 2000 0.5649932 NA 0.01818495
3 C 2000 0.5483291 NA 0.01818495
4 A 2001 0.9175864 0.9175864 0.9175864
5 B 2001 0.2415837 NA 0.9175864
6 C 2001 0.2250608 NA 0.9175864
7 A 2002 0.6037224 0.6037224 0.6037224
8 B 2002 0.8712926 NA 0.6037224
9 C 2002 0.6293625 NA 0.6037224
10 A 2003 0.8126948 0.8126948 0.8126948
11 B 2003 0.7540445 NA 0.8126948
12 C 2003 0.02220114 NA 0.8126948
13 A 2004 0.3961279 0.3961279 0.3961279
14 B 2004 0.3638186 NA 0.3961279
15 C 2004 0.8682010 NA 0.3961279
16 A 2005 0.04196315 0.04196315 0.04196315
17 B 2005 0.4879482 NA 0.04196315
18 C 2005 0.8605212 NA 0.04196315
>数据%>%按(年份)分组%>%
+变异(值\u tmp=if\u else(类别==“A”,值,不真实),
+值2=平均值(值为na.rm=真)
#一个tibble:18x5
#组别:年份[6]
类别年份值\u tmp值2
1 A 2000 0.01818495 0.01818495 0.01818495
2 B 2000 0.5649932 NA 0.01818495
3 C 2000 0.5483291 NA 0.01818495
4 A 2001 0.9175864 0.9175864 0.9175864
5 B 2001 0.2415837 NA 0.9175864
6 C 2001 0.2250608 NA 0.9175864
7 A 2002 0.6037224 0.6037224 0.6037224
8 B 2002 0.8712926 NA 0.6037224
9 C 2002 0.6293625 NA 0.6037224
10 A 2003 0.8126948 0.8126948 0.8126948 0.8126948
11 B 2003 0.7540445 NA 0.8126948
12 C 2003 0.02220114 NA 0.8126948
13 A 2004 0.3961279 0.3961279 0.3961279
14 B 2004 0.3638186 NA 0.3961279
15 C 2004 0.8682010 NA 0.3961279
16A 2005 0.04196315 0.04196315 0.04196315
17 B 2005 0.4879482 NA 0.04196315
18 C 2005 0.8605212 NA 0.04196315
这是个小怪胎
> data %>% group_by(year) %>%
+ mutate(value_tmp = if_else(category == "A", value, NA_real_),
+ value2 = mean(value_tmp, na.rm = TRUE))
# A tibble: 18 x 5
# Groups: year [6]
category year value value_tmp value2
<fct> <int> <dbl> <dbl> <dbl>
1 A 2000 0.01818495 0.01818495 0.01818495
2 B 2000 0.5649932 NA 0.01818495
3 C 2000 0.5483291 NA 0.01818495
4 A 2001 0.9175864 0.9175864 0.9175864
5 B 2001 0.2415837 NA 0.9175864
6 C 2001 0.2250608 NA 0.9175864
7 A 2002 0.6037224 0.6037224 0.6037224
8 B 2002 0.8712926 NA 0.6037224
9 C 2002 0.6293625 NA 0.6037224
10 A 2003 0.8126948 0.8126948 0.8126948
11 B 2003 0.7540445 NA 0.8126948
12 C 2003 0.02220114 NA 0.8126948
13 A 2004 0.3961279 0.3961279 0.3961279
14 B 2004 0.3638186 NA 0.3961279
15 C 2004 0.8682010 NA 0.3961279
16 A 2005 0.04196315 0.04196315 0.04196315
17 B 2005 0.4879482 NA 0.04196315
18 C 2005 0.8605212 NA 0.04196315
>数据%>%按(年份)分组%>%
+变异(值\u tmp=if\u else(类别==“A”,值,不真实),
+值2=平均值(值为na.rm=真)
#一个tibble:18x5
#组别:年份[6]
类别年份值\u tmp值2
1 A 2000 0.01818495 0.01818495 0.01818495
2 B 2000 0.5649932 NA 0.01818495
3 C 2000 0.5483291 NA 0.01818495
4 A 2001 0.9175864 0.9175864 0.9175864
5 B 2001 0.2415837 NA 0.9175864
6 C 2001 0.2250608 NA 0.9175864
7 A 2002 0.6037224 0.6037224 0.6037224
8 B 2002 0.8712926 NA 0.6037224
9 C 2002 0.6293625 NA 0.6037224
10 A 2003 0.8126948 0.8126948 0.8126948 0.8126948
11 B 2003 0.7540445 NA 0.8126948
12 C 2003 0.02220114 NA 0.8126948
13 A 2004 0.3961279 0.3961279 0.3961279
14 B 2004 0.3638186 NA 0.3961279
15 C 2004 0.8682010 NA 0.3961279
16A 2005 0.04196315 0.04196315 0.04196315
17 B 2005 0.4879482 NA 0.04196315
18 C 2005 0.8605212 NA 0.04196315
通过稍微修改Noobie的响应并使用tidyverse的填充,我获得了预期的结果:
test <- data %>% group_by(year) %>%
mutate(value_tmp = if_else(category == "A", value, NA_real_))%>%
fill(value_tmp)
test%group\u按年份%>%
变异(值\u tmp=if\u else(类别==“A”,值,不真实))%>%
填充(值\u tmp)
通过稍微修改Noobie的响应并使用tidyverse的填充,我获得了预期的结果:
test <- data %>% group_by(year) %>%
mutate(value_tmp = if_else(category == "A", value, NA_real_))%>%
fill(value_tmp)
test%group\u按年份%>%
变异(值\u tmp=if\u else(类别==“A”,值,不真实))%>%
填充(值\u tmp)
它确实对我重复。请检查您是否已加载了plyr
以及dplyr
,即使在加载了两个包之后,我仍然没有预期的行为,并重复此操作。请检查您是否已经加载了plyr
和dplyr
,即使在加载了两个包之后,我仍然没有预期的行为注意,我已经尝试了您的代码,但是我没有按年份重复值,而是有NA值。所有NAs列的值2注意,我已经尝试了你的代码,但不是每年重复的值,而是NA值。以及所有NAs列的值2