R 为订阅之间的间隔添加空行_R_Date_Add_Rows

R 为订阅之间的间隔添加空行

r date

R 为订阅之间的间隔添加空行,r,date,add,rows,R,Date,Add,Rows,我已经为此挣扎了一段时间，但我在任何地方都找不到类似的问题，因此我在这里提出了第一个问题我对R相当陌生，所以请原谅我犯的任何明显错误我有一个数据集，它为用户拥有或曾经拥有的每个订阅都有一行。有些用户有多行，而有些用户只有一行。仅存在活动订阅或以前活动的订阅我有两个变量，分别称为Begindate和Enddate，用于说明订阅何时开始和何时结束。我已经创建了relationlength变量，它表示每种订阅类型的这两个变量之间的天数。这意味着relationlength变量仅给出订阅处于活动状

我已经为此挣扎了一段时间，但我在任何地方都找不到类似的问题，因此我在这里提出了第一个问题

我对R相当陌生，所以请原谅我犯的任何明显错误

我有一个数据集，它为用户拥有或曾经拥有的每个订阅都有一行。有些用户有多行，而有些用户只有一行。仅存在活动订阅或以前活动的订阅

我有两个变量，分别称为Begindate和Enddate，用于说明订阅何时开始和何时结束。我已经创建了relationlength变量，它表示每种订阅类型的这两个变量之间的天数。这意味着relationlength变量仅给出订阅处于活动状态的天数

我想做的是在不同订阅行之间创建空行，在没有订阅活动的时间段内，从特定用户已知的最早开始日期开始，到所有订阅结束的给定日期结束（20-04-2022）

我尝试比较用户已知的第一个开始日期和最终日期之间的日期差，并减去其他订阅类型已知的关系长度。然而，我无法做到这一点

df当前外观的示例：

（rl代表关系长度）

我希望它看起来像什么：

ID Begindate Enddate Subscrtype active rl_fixed rl_promo rl_none Productgroup

1 2019-08-26 2022-04-20 fixed   1      968      0        0       1
1 2019-08-24 2019-08-25 none    0      0        0        2       NA
1 2018-08-24 2019-08-23 fixed   0      364      0        0       1
1 2016-08-24 2018-08-23 none    0      0        0        729     NA
1 2015-08-24 2016-08-23 promo   0      0        364      0       2
2 2019-09-13 2022-04-20 none    0      0        0        950     NA
2 2019-08-26 2019-09-12 fixed   0      17       0        0       1
2 2019-08-24 2019-08-25 none    0      0        0        2       NA
2 2018-08-24 2019-08-23 fixed   0      364      0        0       1

最终目标是聚合并清楚地概述用户可能使用的不同类型关系的特定关系长度

提前谢谢你

实际df中一个特定用户的dput：

structure(list(ï..CRM.relatienummer = structure(c(1L, 1L, 1L, 
1L, 1L, 1L), .Label = "1", class = "factor"), Begindatum = c("2019-08-26", 
"2018-08-24", "2017-08-24", "2016-08-24", "2015-08-20", "2016-06-01"
), Einddatum = c("2022-04-20", "2019-08-23", "2018-08-23", "2017-08-23", 
"2016-05-31", "2016-08-19"), Type.abonnement = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = "Actie", class = "factor"), Status_dummy = c(1, 
0, 0, 0, 0, 0), relationlength_fixed = c(0, 0, 0, 0, 0, 0), relationlength_promo = c(968, 
364, 364, 364, 285, 79), relationlength_trial = c(0, 0, 0, 0, 
0, 0), fixed_dummy = c(0, 0, 0, 0, 0, 0), trial_dummy = c(0, 
0, 0, 0, 0, 0), promotional_dummy = c(1, 1, 1, 1, 1, 1)), row.names = c("1:20610", 
"2:38646", "2:39231", "2:39232", "2:39248", "2:39837"), class = "data.frame")

编辑：

我已尝试运行以下代码：

dfs <- split(testdata,testdata$ï..CRM.relatienummer)

r <- lapply(seq(length(dfs)), function(k){
  v <- dfs[[k]]
  vt <- data.frame(unique(v$ï..CRM.relatienummer), 
                   as.character((as.Date(v$Einddatum)+1)[-1]), 
                   as.character((as.Date(v$Begindatum)-1)[-nrow(v)]), 
                   0,
                   0,
                   0,
                   0,
                   (as.Date(v$Begindatum)-1)[-nrow(v)] - (as.Date(v$Einddatum)+1)[-1],
                   NA,
                   0,
                   0,
                   0,
                   0,
                   0)
  colnames(vt) <- c(colnames(v)[-ncol(v)],"rl_none",colnames(v)[ncol(v)])
  (testdata <- rbind(data.frame(v[-ncol(v)],rl_none = 0,v[ncol(v)]),vt))[order(as.Date(testdata$Begindatum),decreasing = T),]
})

res <- data.frame(Reduce(rbind,r),row.names = NULL)

希望这就是你所期待的

dfs <- split(df,df$ID)

r <- lapply(seq(length(dfs)), function(k){
  v <- dfs[[k]]
  vt <- data.frame(unique(v$ID), 
                   as.character((as.Date(v$Enddate)+1)[-1]), 
                   as.character((as.Date(v$Begindate)-1)[-nrow(v)]), 
                   "none",
                   0,
                   0,
                   0,
                   (as.Date(v$Begindate)-1)[-nrow(v)] - (as.Date(v$Enddate)+1)[-1],
                   NA)
  colnames(vt) <- c(colnames(v)[-ncol(v)],"rl_none",colnames(v)[ncol(v)])
  (df <- rbind(data.frame(v[-ncol(v)],rl_none = 0,v[ncol(v)]),vt))[order(as.Date(df$Begindate),decreasing = T),]
})

res <- data.frame(Reduce(rbind,r),row.names = NULL)

数据

structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Begindate = structure(c(3L, 
2L, 1L, 3L, 2L), .Label = c("2015-08-24", "2018-08-24", "2019-08-26"
), class = "factor"), Enddate = structure(c(4L, 2L, 1L, 3L, 2L
), .Label = c("2016-08-23", "2019-08-23", "2019-09-12", "2022-04-20"
), class = "factor"), Subscrtype = structure(c(1L, 1L, 2L, 1L, 
1L), .Label = c("fixed", "promo"), class = "factor"), active = c(1L, 
0L, 0L, 0L, 0L), rl_fixed = c(968L, 364L, 0L, 17L, 364L), rl_promo = c(0L, 
0L, 364L, 0L, 0L), Productgroup = c(1L, 1L, 2L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))

嗨，乔普，欢迎来到SO。您能否将

dput

（例如

dput（df）

）的结果发布到您的示例中，以使其更易于引入R？嗨！谢谢你的快速回复。我有一个特定案例的dput。我的df与示例中的df略有不同，但仍然非常接近。我已将dput添加到OP。感谢您的帮助！我似乎无法让它在真实的数据集上工作，即使只有10次观察。它不断地加载。我在OP中提到了我的真实数据集的dput。编辑：我知道哪里出错了，我将尝试将我的真实数据集与简化的示例相匹配。@Joep那么，对于更新后的

dput

数据，您希望的输出是什么？我只在代码示例中使用了您的数据，因为我的代码中的字段不同于您的真实数据集，因此不直接适用……我理解！我希望我的R知识至少足以调整建议，使其适用于我的真实数据集，但不幸的是，情况并非如此。“我的完整数据集”的目标是在当前行之间添加行，这些行表示未激活订阅的时间段。最终目标是聚合dataframe，并具有关系长度变量和总和虚拟变量，指示数据集中每个用户不同类型的订阅量和相应的关系长度。“无”的所有其他列行可以是NAI。我想对不同类型的订阅状态进行多项式逻辑回归：固定、促销（、试用）和非活动。@Joep我看到你用我的代码进行测试，你忘了根据你的数据帧调整我代码中的列数。我的案例有8列，而你的案例有13列

dfs <- split(df,df$ID)

r <- lapply(seq(length(dfs)), function(k){
  v <- dfs[[k]]
  vt <- data.frame(unique(v$ID), 
                   as.character((as.Date(v$Enddate)+1)[-1]), 
                   as.character((as.Date(v$Begindate)-1)[-nrow(v)]), 
                   "none",
                   0,
                   0,
                   0,
                   (as.Date(v$Begindate)-1)[-nrow(v)] - (as.Date(v$Enddate)+1)[-1],
                   NA)
  colnames(vt) <- c(colnames(v)[-ncol(v)],"rl_none",colnames(v)[ncol(v)])
  (df <- rbind(data.frame(v[-ncol(v)],rl_none = 0,v[ncol(v)]),vt))[order(as.Date(df$Begindate),decreasing = T),]
})

res <- data.frame(Reduce(rbind,r),row.names = NULL)

> res
  ID  Begindate    Enddate Subscrtype active rl_fixed rl_promo rl_none Productgroup
1  1 2019-08-26 2022-04-20      fixed      1      968        0       0            1
2  1 2019-08-24 2019-08-25       none      0        0        0       1           NA
3  1 2018-08-24 2019-08-23      fixed      0      364        0       0            1
4  1 2016-08-24 2018-08-23       none      0        0        0     729           NA
5  1 2015-08-24 2016-08-23      promo      0        0      364       0            2
6  2 2019-08-26 2019-09-12      fixed      0       17        0       0            1
7  2 2019-08-24 2019-08-25       none      0        0        0       1           NA
8  2 2018-08-24 2019-08-23      fixed      0      364        0       0            1

structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Begindate = structure(c(3L, 
2L, 1L, 3L, 2L), .Label = c("2015-08-24", "2018-08-24", "2019-08-26"
), class = "factor"), Enddate = structure(c(4L, 2L, 1L, 3L, 2L
), .Label = c("2016-08-23", "2019-08-23", "2019-09-12", "2022-04-20"
), class = "factor"), Subscrtype = structure(c(1L, 1L, 2L, 1L, 
1L), .Label = c("fixed", "promo"), class = "factor"), active = c(1L, 
0L, 0L, 0L, 0L), rl_fixed = c(968L, 364L, 0L, 17L, 364L), rl_promo = c(0L, 
0L, 364L, 0L, 0L), Productgroup = c(1L, 1L, 2L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))