R-按相对学期顺序汇总课程注册应用问题_R_Dplyr_Sequence_Purrr

R-按相对学期顺序汇总课程注册应用问题

R-按相对学期顺序汇总课程注册应用问题,r,dplyr,sequence,purrr,R,Dplyr,Sequence,Purrr,我想提取出代码，总结出n门课程和n个学期的一组学生的课程学习模式和成功率例子在以下学生群体中，有多少人在学习“A”课程后进入“B”课程，其中有多少人成功： data <- data.frame(student = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5), term = c(2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 4), course = c(

我想提取出代码，总结出n门课程和n个学期的一组学生的课程学习模式和成功率

例子在以下学生群体中，有多少人在学习“A”课程后进入“B”课程，其中有多少人成功：

data <- data.frame(student = c(1, 1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5),
                   term    = c(2, 3, 3, 1, 2, 3, 2, 1, 3, 1, 2, 4),
                   course  = c('A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'C'),
                   success = c(1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1),
                   stringsAsFactors = FALSE)

然后，我尝试创建一个函数，将源课程的摘要返回到目标课程，最终目标是

map

将此函数映射到包含源和目标的所有唯一排列的列表：

attempt_summary <- function(df, source, target){

  temp_df <- df %>%
                filter(map_lgl(schedule, ~any(.x$course == source)))%>%
                select(student, source_term_dense = term_dense)

  df <- df %>%
        left_join(temp_df, by = "student")%>%
        filter(term_dense >= source_term_dense)

  df %>%
    group_by(term_dense) %>%
    summarise(completed_source = sum(map_int(schedule, ~any(.x$course == source & .x$success == 1))),
              attempted_target = sum(map_int(schedule, ~any(.x$course == target))),
              completed_target = sum(map_int(schedule, ~any(.x$course == target & .x$success == 1))))

}

堆栈溢出柱除了关于

purr

的许多其他内容外，我在寻找解决方案时引用了这些帖子，但这些都不是我想要的

会话信息下面是我的

sessionInfo（）

调用的输出：

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_0.3.2   tidyr_0.8.3   dplyr_0.8.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       fansi_0.4.0      utf8_1.1.4       crayon_1.3.4     assertthat_0.2.1 R6_2.4.0        
 [7] magrittr_1.5     pillar_1.3.1     cli_1.1.0        rlang_0.3.4      rstudioapi_0.10  tools_3.5.3     
[13] glue_1.3.1       compiler_3.5.3   pkgconfig_2.0.2  tidyselect_0.2.5 tibble_2.1.1

下面是你在中间的一个段落关于“成功的学生有百分之十的课程”A，随后采取的课程“B”，并有一个成功率为“%”。这会发现每个课程的Y%都是成功的，每个课程都是不成功的

library(tidyverse)
data2 <- data %>%
  left_join(data, by = c("student")) %>%   # add future course results to each result that has any
  filter(term.y > term.x) %>%  # includes all future courses; could limit to just next one?
  count(course.x, success.x, course.y, success.y) %>%
  spread(success.y, n, fill = 0) %>%
  mutate(success_rate = `1`/ (`0` + `1`)) %>%
  select(course.x:course.y, success_rate) %>%
  spread(course.y, success_rate)

库（tidyverse）
数据2%
left_join（数据，by=c（“学生”））%>%#将未来的课程结果添加到具有任何
过滤器（学期y>学期x）%>%#包括所有未来课程；可以限制到下一个吗？
计数（course.x，success.x，course.y，success.y）%>%
排列（成功y，n，填充=0）%>%
变异（成功率=`1`/（`0`+`1`））%>%
选择（课程x：课程y，成功率）%>%
差价（疗程、成功率）

结果：将每个“事件1”作为一行，并在每个列中列出未来Y类的成功率。这表明，参加A的人通过了所有后续课程，无论他们在A中的表现如何。参加B的人在C中的通过率为50-50

> data2
# A tibble: 3 x 5
  course.x success.x     A     B     C
  <chr>        <dbl> <dbl> <dbl> <dbl>
1 A                0     1    NA   1  
2 A                1    NA     1   1  
3 B                1    NA    NA   0.5

数据2 #一个tibble:3x5 课程x成功x A B C 1 A 0 1 NA 1 2 A 1 NA 1 3 B 1 NA 0.5

attempt_summary(data, "A", "B")

# A tibble: 3 x 4
  term_dense completed_source attempted_target completed_target
       <int>            <int>            <int>            <int>
1          1                2                0                0
2          2                2                2                2
3          3                0                0                0

# DO NOT RUN - DOESN'T WORK
# map(data, attempt_summary, source = src_list, target = trgt_list)

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_0.3.2   tidyr_0.8.3   dplyr_0.8.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       fansi_0.4.0      utf8_1.1.4       crayon_1.3.4     assertthat_0.2.1 R6_2.4.0        
 [7] magrittr_1.5     pillar_1.3.1     cli_1.1.0        rlang_0.3.4      rstudioapi_0.10  tools_3.5.3     
[13] glue_1.3.1       compiler_3.5.3   pkgconfig_2.0.2  tidyselect_0.2.5 tibble_2.1.1

library(tidyverse)
data2 <- data %>%
  left_join(data, by = c("student")) %>%   # add future course results to each result that has any
  filter(term.y > term.x) %>%  # includes all future courses; could limit to just next one?
  count(course.x, success.x, course.y, success.y) %>%
  spread(success.y, n, fill = 0) %>%
  mutate(success_rate = `1`/ (`0` + `1`)) %>%
  select(course.x:course.y, success_rate) %>%
  spread(course.y, success_rate)

> data2
# A tibble: 3 x 5
  course.x success.x     A     B     C
  <chr>        <dbl> <dbl> <dbl> <dbl>
1 A                0     1    NA   1  
2 A                1    NA     1   1  
3 B                1    NA    NA   0.5