R-按相对学期顺序汇总课程注册 应用问题

R-按相对学期顺序汇总课程注册 应用问题,r,dplyr,sequence,purrr,R,Dplyr,Sequence,Purrr,我想提取出代码,总结出n门课程和n个学期的一组学生的课程学习模式和成功率 例子 在以下学生群体中,有多少人在学习“A”课程后进入“B”课程,其中有多少人成功: data <- data.frame(student = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5), term = c(2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 4), course = c(

我想提取出代码,总结出n门课程和n个学期的一组学生的课程学习模式和成功率

例子 在以下学生群体中,有多少人在学习“A”课程后进入“B”课程,其中有多少人成功:

data <- data.frame(student = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5),
                   term    = c(2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 4),
                   course  = c('A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'C'),
                   success = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1),
                   stringsAsFactors = FALSE)
然后,我尝试创建一个函数,将源课程的摘要返回到目标课程,最终目标是
map
将此函数映射到包含源和目标的所有唯一排列的列表:

attempt_summary <- function(df, source, target){

  temp_df <- df %>%
                filter(map_lgl(schedule, ~any(.x$course == source)))%>%
                select(student, source_term_dense = term_dense)

  df <- df %>%
        left_join(temp_df, by = "student")%>%
        filter(term_dense >= source_term_dense)

  df %>%
    group_by(term_dense) %>%
    summarise(completed_source = sum(map_int(schedule, ~any(.x$course == source & .x$success == 1))),
              attempted_target = sum(map_int(schedule, ~any(.x$course == target))),
              completed_target = sum(map_int(schedule, ~any(.x$course == target & .x$success == 1))))

}
堆栈溢出柱 除了关于
purr
的许多其他内容外,我在寻找解决方案时引用了这些帖子,但这些都不是我想要的

会话信息 下面是我的
sessionInfo()
调用的输出:

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_0.3.2   tidyr_0.8.3   dplyr_0.8.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       fansi_0.4.0      utf8_1.1.4       crayon_1.3.4     assertthat_0.2.1 R6_2.4.0        
 [7] magrittr_1.5     pillar_1.3.1     cli_1.1.0        rlang_0.3.4      rstudioapi_0.10  tools_3.5.3     
[13] glue_1.3.1       compiler_3.5.3   pkgconfig_2.0.2  tidyselect_0.2.5 tibble_2.1.1    

下面是你在中间的一个段落关于“成功的学生有百分之十的课程”A,随后采取的课程“B”,并有一个成功率为“%”。 这会发现每个课程的Y%都是成功的,每个课程都是不成功的

library(tidyverse)
data2 <- data %>%
  left_join(data, by = c("student")) %>%   # add future course results to each result that has any
  filter(term.y > term.x) %>%  # includes all future courses; could limit to just next one?
  count(course.x, success.x, course.y, success.y) %>%
  spread(success.y, n, fill = 0) %>%
  mutate(success_rate = `1`/ (`0` + `1`)) %>%
  select(course.x:course.y, success_rate) %>%
  spread(course.y, success_rate)
库(tidyverse)
数据2%
left_join(数据,by=c(“学生”))%>%#将未来的课程结果添加到具有任何
过滤器(学期y>学期x)%>%#包括所有未来课程;可以限制到下一个吗?
计数(course.x,success.x,course.y,success.y)%>%
排列(成功y,n,填充=0)%>%
变异(成功率=`1`/(`0`+`1`))%>%
选择(课程x:课程y,成功率)%>%
差价(疗程、成功率)
结果:将每个“事件1”作为一行,并在每个列中列出未来Y类的成功率。这表明,参加A的人通过了所有后续课程,无论他们在A中的表现如何。参加B的人在C中的通过率为50-50

> data2
# A tibble: 3 x 5
  course.x success.x     A     B     C
  <chr>        <dbl> <dbl> <dbl> <dbl>
1 A                0     1    NA   1  
2 A                1    NA     1   1  
3 B                1    NA    NA   0.5
数据2 #一个tibble:3x5 课程x成功x A B C 1 A 0 1 NA 1 2 A 1 NA 1 3 B 1 NA 0.5
attempt_summary(data, "A", "B")

# A tibble: 3 x 4
  term_dense completed_source attempted_target completed_target
       <int>            <int>            <int>            <int>
1          1                2                0                0
2          2                2                2                2
3          3                0                0                0
# DO NOT RUN - DOESN'T WORK
# map(data, attempt_summary, source = src_list, target = trgt_list)
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_0.3.2   tidyr_0.8.3   dplyr_0.8.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       fansi_0.4.0      utf8_1.1.4       crayon_1.3.4     assertthat_0.2.1 R6_2.4.0        
 [7] magrittr_1.5     pillar_1.3.1     cli_1.1.0        rlang_0.3.4      rstudioapi_0.10  tools_3.5.3     
[13] glue_1.3.1       compiler_3.5.3   pkgconfig_2.0.2  tidyselect_0.2.5 tibble_2.1.1    
library(tidyverse)
data2 <- data %>%
  left_join(data, by = c("student")) %>%   # add future course results to each result that has any
  filter(term.y > term.x) %>%  # includes all future courses; could limit to just next one?
  count(course.x, success.x, course.y, success.y) %>%
  spread(success.y, n, fill = 0) %>%
  mutate(success_rate = `1`/ (`0` + `1`)) %>%
  select(course.x:course.y, success_rate) %>%
  spread(course.y, success_rate)
> data2
# A tibble: 3 x 5
  course.x success.x     A     B     C
  <chr>        <dbl> <dbl> <dbl> <dbl>
1 A                0     1    NA   1  
2 A                1    NA     1   1  
3 B                1    NA    NA   0.5