R 基于使用第一天的队列_R_Dplyr_Transform

R 基于使用第一天的队列

R 基于使用第一天的队列,r,dplyr,transform,R,Dplyr,Transform,我想根据在我的应用程序数据集中首次观察到用户的月份建立群组。比如说，2018年1月是我观察期的第一个月我试过类似的方法（…不起作用）：示例数据： da_app <- data.frame(userid = c(1,1,2,2), day = c("2019-02-20","2019-02-21","2018-03-11","2018-03-12")) 我想要这个： da_app2 userid day cohort 1 1 2019-02-20 1

我想根据在我的应用程序数据集中首次观察到用户的月份建立群组。比如说，2018年1月是我观察期的第一个月

我试过类似的方法（…不起作用）：

示例数据：

da_app <- data.frame(userid = c(1,1,2,2), day = c("2019-02-20","2019-02-21","2018-03-11","2018-03-12"))

我想要这个：

da_app2
  userid        day cohort
1      1 2019-02-20     14
2      1 2019-02-21     14
3      2 2018-03-11      3
4      2 2018-03-12      3

使用

dplyr

和

lubridate

，您可以执行以下操作：

df %>%
 mutate(cohort = interval(ymd("2018-01-01"), ymd(day)) %/% months(1) + 1)

  userid        day cohort
1      1 2019-02-20     14
2      1 2019-02-21     14
3      2 2018-03-11      3
4      2 2018-03-12      3

使用

dplyr

和

lubridate

，您可以执行以下操作：

df %>%
 mutate(cohort = interval(ymd("2018-01-01"), ymd(day)) %/% months(1) + 1)

  userid        day cohort
1      1 2019-02-20     14
2      1 2019-02-21     14
3      2 2018-03-11      3
4      2 2018-03-12      3

只需将

substr

转换为

factor

，队列标签不带额外

库

s。无论如何，您可能需要队列因素

da_app$cohort <- factor(substr(da_app$day, 6, 7), labels=c(14, 3))
da_app
#   userid        day cohort
# 1      1 2019-02-20     14
# 2      1 2019-02-21     14
# 3      2 2018-03-11      3
# 4      2 2018-03-12      3

da_app$court只需将substr
转换为factor
，队列标签不带额外库
s。无论如何，您可能需要队列因素
da_app$cohort <- factor(substr(da_app$day, 6, 7), labels=c(14, 3))
da_app
#   userid        day cohort
# 1      1 2019-02-20     14
# 2      1 2019-02-21     14
# 3      2 2018-03-11      3
# 4      2 2018-03-12      3

da_app$队列作为队列名称的背后逻辑是什么？作为队列名称的背后逻辑是什么？
da_app <- structure(list(userid = c(1, 1, 2, 2), day = structure(c(3L, 
4L, 1L, 2L), .Label = c("2018-03-11", "2018-03-12", "2019-02-20", 
"2019-02-21"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))