通过dplyr使用Group by和Slope获得新列
我正在寻找一个更直接的解决方案,使用dplyr从我的数据中获得一个标题为slope的列。数据集按季节和统计类型分组。我目前的代码是:通过dplyr使用Group by和Slope获得新列,r,dplyr,tidyverse,broom,R,Dplyr,Tidyverse,Broom,我正在寻找一个更直接的解决方案,使用dplyr从我的数据中获得一个标题为slope的列。数据集按季节和统计类型分组。我目前的代码是: library(tidyverse); library(broom) full_table_raw <- structure(list(playerID = c("abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02", "
library(tidyverse); library(broom)
full_table_raw <- structure(list(playerID = c("abreujo02", "abreujo02",
"abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02",
"abreujo02", "abreujo02", "abreujo02", "abreujo02", "abreujo02",
"arenano01", "arenano01", "arenano01", "arenano01", "arenano01",
"arenano01", "arenano01", "arenano01", "arenano01", "arenano01",
"arenano01", "arenano01", "blackch02", "blackch02", "blackch02",
"blackch02", "blackch02", "blackch02", "blackch02", "blackch02",
"blackch02", "blackch02", "blackch02", "blackch02"), season = c(2014L,
2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2016L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L,
2016L, 2016L, 2016L, 2016L, 2014L, 2014L, 2014L, 2014L, 2015L,
2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L), stat = c("HR",
"R", "RBI", "SB", "HR", "R", "RBI", "SB", "HR", "R", "RBI", "SB",
"HR", "R", "RBI", "SB", "HR", "R", "RBI", "SB", "HR", "R", "RBI",
"SB", "HR", "R", "RBI", "SB", "HR", "R", "RBI", "SB", "HR", "R",
"RBI", "SB"), points = c(3, 2, 3, 2, 2, 1, 2, 1, 1, 1, 2, 1,
1, 1, 1, 1, 3, 3, 3, 2, 3, 3, 3, 2, 2, 3, 2, 3, 1, 2, 1, 3, 2,
2, 1, 3), ranks = c(1, 2, 1, 2, 2, 3, 2, 3, 3, 3, 2, 3, 3, 3,
3, 3, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 3, 2, 3, 1, 2, 2, 3,
1), value = c(36, 80, 107, 3, 30, 88, 101, 0, 25, 67, 100, 0,
18, 58, 61, 2, 42, 97, 130, 2, 41, 116, 133, 2, 19, 82, 72, 28,
17, 93, 58, 43, 29, 111, 82, 17)), class = "data.frame", row.names = c(NA,
-36L))
sgp_table <- full_table_raw %>%
group_by(season, stat) %>%
do(tidy(lm(value ~ points, data = .))) %>%
filter(term == "points") %>% select(season, stat, estimate) %>%
rename(slope = estimate)
我正在寻找一种更简洁的方法来根据当前数据创建坡度列 这里有一个使用nest/unest的选项
这里有一个使用nest/unnest的选项
不确定你会认为这比你所拥有的更干净,但是通过使用Nest你不需要按
分组。不确定你会认为这比你所拥有的更干净,但是通过使用Nest你不需要按
分组。你能提供你的数据样本吗?刚刚添加到最初的帖子中。你可以将select改为selectseason,stat,slope=estimate,并删除重命名行,但这似乎相当有效。我同意Mako212。。。你说的更直接是什么意思?谢谢。为什么lm导出2行截距和点?你能提供你的数据样本吗?刚刚添加到最初的帖子中。你可以将选择更改为selectseason,stat,slope=estimate并删除重命名行,但这似乎相当有效。我同意Mako212。。。你说的更直接是什么意思?谢谢。为什么lm导出2行截距和点?
library(tidyverse)
library(broom)
full_table_raw %>%
group_by(season, stat) %>%
nest %>%
mutate(modelout = map(data, ~ lm(value~ points, data = .x) %>%
tidy %>%
filter(term == "points") %>%
select(slope = estimate))) %>%
select(-data) %>%
unnest
# A tibble: 12 x 3
# season stat slope
# <int> <chr> <dbl>
# 1 2014 HR 9.
# 2 2014 R 12
# 3 2014 RBI 23.
# 4 2014 SB 13.0
# 5 2015 HR 12.5
# 6 2015 R 4.50
# 7 2015 RBI 36
# 8 2015 SB 21.5
# 9 2016 HR 8.00
#10 2016 R 24.5
#11 2016 RBI 25.5
#12 2016 SB 8.5
sgp_table <- full_table_raw %>%
nest(-season, -stat) %>%
mutate(slope = map(data, ~coef(lm(value ~ points, data = .x))[["points"]])) %>%
select(-data)
> sgp_table
season stat slope
1 2014 HR 9
2 2014 R 12
3 2014 RBI 23
4 2014 SB 13
5 2015 HR 12.5
6 2015 R 4.5
7 2015 RBI 36
8 2015 SB 21.5
9 2016 HR 8
10 2016 R 24.5
11 2016 RBI 25.5
12 2016 SB 8.5