R-按相对学期顺序汇总课程注册 应用问题
我想提取出代码,总结出n门课程和n个学期的一组学生的课程学习模式和成功率 例子 在以下学生群体中,有多少人在学习“A”课程后进入“B”课程,其中有多少人成功:R-按相对学期顺序汇总课程注册 应用问题,r,dplyr,sequence,purrr,R,Dplyr,Sequence,Purrr,我想提取出代码,总结出n门课程和n个学期的一组学生的课程学习模式和成功率 例子 在以下学生群体中,有多少人在学习“A”课程后进入“B”课程,其中有多少人成功: data <- data.frame(student = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5), term = c(2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 4), course = c(
data <- data.frame(student = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5),
term = c(2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 4),
course = c('A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'C'),
success = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1),
stringsAsFactors = FALSE)
然后,我尝试创建一个函数,将源课程的摘要返回到目标课程,最终目标是map
将此函数映射到包含源和目标的所有唯一排列的列表:
attempt_summary <- function(df, source, target){
temp_df <- df %>%
filter(map_lgl(schedule, ~any(.x$course == source)))%>%
select(student, source_term_dense = term_dense)
df <- df %>%
left_join(temp_df, by = "student")%>%
filter(term_dense >= source_term_dense)
df %>%
group_by(term_dense) %>%
summarise(completed_source = sum(map_int(schedule, ~any(.x$course == source & .x$success == 1))),
attempted_target = sum(map_int(schedule, ~any(.x$course == target))),
completed_target = sum(map_int(schedule, ~any(.x$course == target & .x$success == 1))))
}
堆栈溢出柱
除了关于purr
的许多其他内容外,我在寻找解决方案时引用了这些帖子,但这些都不是我想要的
sessionInfo()
调用的输出:
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.3.2 tidyr_0.8.3 dplyr_0.8.0.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 fansi_0.4.0 utf8_1.1.4 crayon_1.3.4 assertthat_0.2.1 R6_2.4.0
[7] magrittr_1.5 pillar_1.3.1 cli_1.1.0 rlang_0.3.4 rstudioapi_0.10 tools_3.5.3
[13] glue_1.3.1 compiler_3.5.3 pkgconfig_2.0.2 tidyselect_0.2.5 tibble_2.1.1
下面是你在中间的一个段落关于“成功的学生有百分之十的课程”A,随后采取的课程“B”,并有一个成功率为“%”。 这会发现每个课程的Y%都是成功的,每个课程都是不成功的
library(tidyverse)
data2 <- data %>%
left_join(data, by = c("student")) %>% # add future course results to each result that has any
filter(term.y > term.x) %>% # includes all future courses; could limit to just next one?
count(course.x, success.x, course.y, success.y) %>%
spread(success.y, n, fill = 0) %>%
mutate(success_rate = `1`/ (`0` + `1`)) %>%
select(course.x:course.y, success_rate) %>%
spread(course.y, success_rate)
库(tidyverse)
数据2%
left_join(数据,by=c(“学生”))%>%#将未来的课程结果添加到具有任何
过滤器(学期y>学期x)%>%#包括所有未来课程;可以限制到下一个吗?
计数(course.x,success.x,course.y,success.y)%>%
排列(成功y,n,填充=0)%>%
变异(成功率=`1`/(`0`+`1`))%>%
选择(课程x:课程y,成功率)%>%
差价(疗程、成功率)
结果:将每个“事件1”作为一行,并在每个列中列出未来Y类的成功率。这表明,参加A的人通过了所有后续课程,无论他们在A中的表现如何。参加B的人在C中的通过率为50-50
> data2
# A tibble: 3 x 5
course.x success.x A B C
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA 1
2 A 1 NA 1 1
3 B 1 NA NA 0.5
数据2
#一个tibble:3x5
课程x成功x A B C
1 A 0 1 NA 1
2 A 1 NA 1
3 B 1 NA 0.5
attempt_summary(data, "A", "B")
# A tibble: 3 x 4
term_dense completed_source attempted_target completed_target
<int> <int> <int> <int>
1 1 2 0 0
2 2 2 2 2
3 3 0 0 0
# DO NOT RUN - DOESN'T WORK
# map(data, attempt_summary, source = src_list, target = trgt_list)
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] purrr_0.3.2 tidyr_0.8.3 dplyr_0.8.0.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 fansi_0.4.0 utf8_1.1.4 crayon_1.3.4 assertthat_0.2.1 R6_2.4.0
[7] magrittr_1.5 pillar_1.3.1 cli_1.1.0 rlang_0.3.4 rstudioapi_0.10 tools_3.5.3
[13] glue_1.3.1 compiler_3.5.3 pkgconfig_2.0.2 tidyselect_0.2.5 tibble_2.1.1
library(tidyverse)
data2 <- data %>%
left_join(data, by = c("student")) %>% # add future course results to each result that has any
filter(term.y > term.x) %>% # includes all future courses; could limit to just next one?
count(course.x, success.x, course.y, success.y) %>%
spread(success.y, n, fill = 0) %>%
mutate(success_rate = `1`/ (`0` + `1`)) %>%
select(course.x:course.y, success_rate) %>%
spread(course.y, success_rate)
> data2
# A tibble: 3 x 5
course.x success.x A B C
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA 1
2 A 1 NA 1 1
3 B 1 NA NA 0.5