R-根据开始和结束日期序列复制行
我有这样一个数据帧“DF”:R-根据开始和结束日期序列复制行,r,R,我有这样一个数据帧“DF”: Flight.Start Flight.End Device Partner Creative Days.in.Flight 2015-08-31 2015-08-31 Standard MSN Video 35 我需要做的是像这样“炸掉它”: 等等。。。。。。直到日期变量到达2015-10-04,然后继续下一次复制 基本上,每一行都会被flight-1中的天数复制(因为已经存在的行可以解释间隔中
Flight.Start Flight.End Device Partner Creative Days.in.Flight
2015-08-31 2015-08-31 Standard MSN Video 35
我需要做的是像这样“炸掉它”:
等等。。。。。。直到日期变量到达2015-10-04,然后继续下一次复制
基本上,每一行都会被flight-1中的天数复制(因为已经存在的行可以解释间隔中的一天,然后是一个新列“Date”)为该航班内的相关日期填写。因此,如果一行的开始日期和结束日期分别为9/1和9/5,则将在现有行的基础上追加4个重复行,并创建一个新列(日期),并且原始行的航班开始和结束日期的日期序列将填充列值
所有日期值的格式均为日期,飞行天数为num,其余为因子
编辑
针对重复的问题标记:
澄清一下,这与被标记为重复的情况不同,因为我的问题并不是真正关注如何根据飞行天数进行复制(我已经知道如何做到!),而是如何在输出数据框中添加列,并在相应的飞行周期内按顺序插入日期。感谢您的提醒…这里有一种使用
splitstackshape
和dplyr
的方法。使用expandRows()
从splitstackshape
包中,您可以按照所述扩展数据框。然后,您想使用mutate()
添加一系列日期。我所做的是按照Flight.Start
和Flight.End
的组合对数据进行分组,然后使用seq()
为每个组创建一个日期序列。first()
使用Flight.Start
和Flight.End
的第一个元素。这样,您就可以创建所需的序列。我希望这对您有所帮助
数据和代码
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
Device = "Standard",
Creative = "Video",
Days.in.Flight = c(3, 6),
stringsAsFactors = FALSE)
# Flight.Start Flight.End Device Creative Days.in.Flight
#1 2015-09-01 2015-09-03 Standard Video 3
#2 2015-09-10 2015-09-15 Standard Video 6
library(splitstackshape)
library(dplyr)
expandRows(mydf, "Days.in.Flight", drop = FALSE) %>%
group_by(Flight.Start, Flight.End) %>%
mutate(Date = seq(first(Flight.Start),
first(Flight.End),
by = 1))
# Flight.Start Flight.End Device Creative Days.in.Flight Date
# (date) (date) (chr) (chr) (dbl) (date)
#1 2015-09-01 2015-09-03 Standard Video 3 2015-09-01
#2 2015-09-01 2015-09-03 Standard Video 3 2015-09-02
#3 2015-09-01 2015-09-03 Standard Video 3 2015-09-03
#4 2015-09-10 2015-09-15 Standard Video 6 2015-09-10
#5 2015-09-10 2015-09-15 Standard Video 6 2015-09-11
#6 2015-09-10 2015-09-15 Standard Video 6 2015-09-12
#7 2015-09-10 2015-09-15 Standard Video 6 2015-09-13
#8 2015-09-10 2015-09-15 Standard Video 6 2015-09-14
#9 2015-09-10 2015-09-15 Standard Video 6 2015-09-15
mydf%
分组依据(航班开始、航班结束)%>%
变异(日期=序号(第一次(航班开始),
第一次(飞行结束),
by=1)
#航班。开始航班。结束设备创建日期。航班日期
#(日期)(日期)(chr)(chr)(dbl)(日期)
#1 2015-09-01 2015-09-03标准视频3 2015-09-01
#2 2015-09-01 2015-09-03标准视频3 2015-09-02
#3 2015-09-01 2015-09-03标准视频3 2015-09-03
#4 2015-09-10 2015-09-15标准视频6 2015-09-10
#5 2015-09-10 2015-09-15标准视频6 2015-09-11
#6 2015-09-10 2015-09-15标准视频6 2015-09-12
#7 2015-09-10 2015-09-15标准视频6 2015-09-13
#8 2015-09-10 2015-09-15标准视频6 2015-09-14
#9 2015-09-10 2015-09-15标准视频6 2015-09-15
下面是一种使用base R的方法:
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
Device = "Standard",
Creative = "Video",
Days.in.Flight = c(3, 6),
stringsAsFactors = FALSE)
expanded <-mydf[rep(row.names(mydf), mydf$ Days.in.Flight), ]
data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))
> data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))
Flight.Start Flight.End Device Creative Days.in.Flight Date
1 2015-09-01 2015-09-03 Standard Video 3 2015-09-01
1.1 2015-09-01 2015-09-03 Standard Video 3 2015-09-02
1.2 2015-09-01 2015-09-03 Standard Video 3 2015-09-03
2 2015-09-10 2015-09-15 Standard Video 6 2015-09-10
2.1 2015-09-10 2015-09-15 Standard Video 6 2015-09-11
2.2 2015-09-10 2015-09-15 Standard Video 6 2015-09-12
2.3 2015-09-10 2015-09-15 Standard Video 6 2015-09-13
2.4 2015-09-10 2015-09-15 Standard Video 6 2015-09-14
2.5 2015-09-10 2015-09-15 Standard Video 6 2015-09-15
mydf或使用data.table
,我们将'data.frame'转换为'data.table'(setDT(mydf)
),按'Days.in.Flight'复制行序列,基于该索引,我们将数据集(.SD[rep(…
)子集,按'Flight.Start'和'Flight.End'分组,我们创建'Date'列
library(data.table)
setDT(mydf)[, .SD[rep(1:.N, Days.in.Flight)]][,
Date:= seq(Flight.Start , Flight.End, by = '1 day'),
by = .(Flight.Start, Flight.End)][]
嘿@Jay,绝对不是,谢谢。我可能不应该包括所有关于复制行的内容,因为我知道如何使用expandRows()
,但这个问题更多的是关于如何填写一个连续的日期列来进行扩展
library(data.table)
setDT(mydf)[, .SD[rep(1:.N, Days.in.Flight)]][,
Date:= seq(Flight.Start , Flight.End, by = '1 day'),
by = .(Flight.Start, Flight.End)][]