dplyr滞后函数多嵌套数据
我想为嵌套在三个组中的值创建一个滞后变量: 例如:dplyr滞后函数多嵌套数据,r,dplyr,group-by,lag,R,Dplyr,Group By,Lag,我想为嵌套在三个组中的值创建一个滞后变量: 例如: df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2), party = rep(c("A", "A", "A", "B", "B", "B"), 2), inc = rep(c(1,2,3), 4),
df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2),
party = rep(c("A", "A", "A", "B", "B", "B"), 2),
inc = rep(c(1,2,3), 4),
value = c(1, 10, 100, 3, 30, 300, 6, 60, 600, 7, 70, 700))
我需要的是:
wave party inc value lag
1 1 A 1 1 NA
2 1 A 2 10 NA
3 1 A 3 100 NA
4 1 B 1 3 NA
5 1 B 2 30 NA
6 1 B 3 300 NA
7 2 A 1 6 1
8 2 A 2 60 10
9 2 A 3 600 100
10 2 B 1 7 3
11 2 B 2 70 30
12 2 B 3 700 300
第二波中甲方收入组(inc)1的被调查人具有第1波中甲方收入组(inc)1的滞后值等
我试过:
df %>% group_by(wave) %>% mutate(lag = lag(value))
df %>% group_by(party, wave) %>% mutate(lag = lag(value))
df %>% group_by(party, wave, inc) %>% mutate(lag = lag(value))
这给了我:
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 100
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 600
11 2 B 2 70 7
12 2 B 3 700 70
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 NA
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 NA
11 2 B 2 70 7
12 2 B 3 700 70
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 NA
3 1 A 3 100 NA
4 1 B 1 3 NA
5 1 B 2 30 NA
6 1 B 3 300 NA
7 2 A 1 6 NA
8 2 A 2 60 NA
9 2 A 3 600 NA
10 2 B 1 7 NA
11 2 B 2 70 NA
12 2 B 3 700 NA
这给了我:
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 100
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 600
11 2 B 2 70 7
12 2 B 3 700 70
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 NA
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 NA
11 2 B 2 70 7
12 2 B 3 700 70
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 NA
3 1 A 3 100 NA
4 1 B 1 3 NA
5 1 B 2 30 NA
6 1 B 3 300 NA
7 2 A 1 6 NA
8 2 A 2 60 NA
9 2 A 3 600 NA
10 2 B 1 7 NA
11 2 B 2 70 NA
12 2 B 3 700 NA
这给了我:
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 100
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 600
11 2 B 2 70 7
12 2 B 3 700 70
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 NA
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 NA
11 2 B 2 70 7
12 2 B 3 700 70
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 NA
3 1 A 3 100 NA
4 1 B 1 3 NA
5 1 B 2 30 NA
6 1 B 3 300 NA
7 2 A 1 6 NA
8 2 A 2 60 NA
9 2 A 3 600 NA
10 2 B 1 7 NA
11 2 B 2 70 NA
12 2 B 3 700 NA
wave party inc值滞后
11A 11NA
21A210NA
31A3100北美
41B13NA
51B230NA
61B3300NA
7 2 A 16 NA
82A260NA
923600NA
102B17NA
112B270NA
12 2 B 3 700 NA
我可以继续这样。我在lag中使用df%>%arrange()和order_by()函数尝试了不同的版本。但由于某些原因,我无法找出如何获得正确的滞后变量。您可以通过仅按
方
和inc
进行分组来实现所需的结果:
库(dplyr)
df%
集团公司(party,inc)%>%
突变(滞后=滞后(值))%>%
解组()
#>#tibble:12 x 5
#>波党公司价值滞后
#>
#>11A 11NA
#>21A210NA
#>31A3100北美
#>41B13NA
#>51B230NA
#>61B3300NA
#>7 2 A 1 6 1
#>82010
#>92A3600 100
#>102B1733
#>112B27030
#>12 2 B 3 700 300