如何在r中基于唯一的日期间隔形成多个子数据帧
这是我的交易数据如何在r中基于唯一的日期间隔形成多个子数据帧,r,date,datetime,split,R,Date,Datetime,Split,这是我的交易数据 data id from to date amount <int> <fctr> <fctr> <date> <dbl> 19521 6644 6934 2005-01-01 700.0 19524 6753 8456 2005-01-01 600.0 19
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1056317 7819 7454 2010-12-31 60.2
1056318 6164 7497 2010-12-31 107.5
1056319 7533 7492 2010-12-31 164.1
现在我想做的是通过基于每个唯一的日期间隔分割数据来获得子数据帧。也就是说,考虑到第一个日期间隔“2004-07-05”-“2005-01-01”
,我们将有一个子数据框,其中date
列中的日期在该日期间隔的范围内。由于我的数据中的日期是按升序排列的,因此第一个日期是“2005-01-01”
。因此,第一个子数据帧将由前4个观测值组成,因为这些观测值的日期列中的日期“2005-01-01”
在间隔“2004-07-05”-“2005-01-01”
的范围内。类似地,考虑到第二个日期间隔“2004-07-06”-“2005-01-02”
,我们将有一个子数据框,其中包含观测值,date
列中的日期在该日期间隔的范围内。因此,第二个子数据帧将由前6个观测值组成,因为日期“2005-01-01”
和“2005-01-02”
在间隔“2004-07-06”-“2005-01-02”
的范围内。那么,继续以这种方式,如何基于这些指定的日期间隔形成多个子数据帧呢
让我们再考虑一下间隔<代码>“2004—07—05”-“2005—01—01”< /代码>。对于这个特定的时间间隔,我们可以将数据子集如下:
id from to date date_minus_180 amount
<int> <fctr> <fctr> <date> <date> <dbl>
19521 6644 6934 2005-01-01 2004-07-05 700.0
19522 9843 9115 2005-01-01 2004-07-05 900.0
19523 9242 9333 2005-01-01 2004-07-05 1000.0
19524 6753 8456 2005-01-01 2004-07-05 600.0
19525 7075 6510 2005-01-02 2004-07-06 400.0
19526 8685 7207 2005-01-02 2004-07-06 1100.0
19527 5513 6046 2005-01-03 2004-07-07 600.0
19528 6340 7047 2005-01-03 2004-07-07 1100.0
19529 6042 6213 2005-01-03 2004-07-07 200.0
19530 5587 9493 2005-01-03 2004-07-07 800.0
...
data[data$date >= "2004-07-05" & data$date <= "2005-01-01",]
我们可以使用Map
:
data$date_minus_180 <- data$date - 180
result <- Map(function(x, y) data[data$date >=y & data$date <= x,],
data$date, data$date_minus_180)
这些函数在大型数据集上不能很好地执行。有没有办法通过data.table解决这个问题?如果要将数据拆分为多个数据帧,我不确定data.table
是否有帮助:/
structure(list(id = c(18529L, 13742L, 9913L, 956L, 2557L, 1602L,
18669L, 35900L, 48667L, 51341L, 53713L, 60126L, 60545L, 65113L,
66783L, 83324L, 87614L, 88898L, 89874L, 94765L, 100277L, 101587L,
103444L, 108414L, 113319L, 121516L, 126607L, 130170L, 131771L,
135002L, 149431L, 157403L, 157645L, 158831L, 162597L, 162680L,
163901L, 165044L, 167082L, 168562L, 168940L, 172578L, 173031L,
173267L, 177507L, 179167L, 182612L, 183499L, 188171L, 189625L,
193940L, 198764L, 199342L, 200134L, 203328L, 203763L, 204733L,
205651L, 209672L, 210242L, 210979L, 214532L, 214741L, 215738L,
216709L, 220828L, 222140L, 222905L, 226133L, 226527L, 227160L,
228193L, 231782L, 232454L, 233774L, 237836L, 237837L, 238860L,
240223L, 245032L, 246673L, 247561L, 251611L, 251696L, 252663L,
254410L, 255126L, 255230L, 258484L, 258485L, 259309L, 259910L,
260542L, 262091L, 264462L, 264887L, 264888L, 266125L, 268574L,
272959L), from = c("5370", "5370", "5370", "8605", "5370", "6390",
"5370", "5370", "8934", "5370", "5635", "6046", "5680", "8026",
"9037", "5370", "7816", "8046", "5492", "8756", "5370", "9254",
"5370", "5370", "7078", "6615", "5370", "9817", "8228", "8822",
"5735", "7058", "5370", "8667", "9315", "6053", "7990", "8247",
"8165", "5656", "9261", "5929", "8251", "5370", "6725", "5370",
"6004", "7022", "7442", "5370", "8679", "6491", "7078", "5370",
"5370", "5370", "5658", "5370", "9296", "8386", "5370", "5370",
"5370", "9535", "5370", "7541", "5370", "9621", "5370", "7158",
"8240", "5370", "5370", "8025", "5370", "5370", "5370", "6989",
"5370", "7059", "5370", "5370", "5370", "9121", "5608", "5370",
"5370", "7551", "5370", "5370", "5370", "5370", "9163", "9362",
"6072", "5370", "5370", "5370", "5370", "5370"), to = c("9356",
"5605", "8567", "5370", "5636", "5370", "8933", "8483", "5370",
"7626", "5370", "5370", "5370", "5370", "5370", "9676", "5370",
"5370", "5370", "5370", "9105", "5370", "9772", "6979", "5370",
"5370", "7564", "5370", "5370", "5370", "5370", "5370", "8744",
"5370", "5370", "5370", "5370", "5370", "5370", "5370", "5370",
"5370", "5370", "7318", "5370", "8433", "5370", "5370", "5370",
"7122", "5370", "5370", "5370", "8566", "6728", "9689", "5370",
"8342", "5370", "5370", "5614", "5596", "5953", "5370", "7336",
"5370", "7247", "5370", "7291", "5370", "5370", "6282", "7236",
"5370", "8866", "8613", "9247", "5370", "6767", "5370", "9273",
"7320", "9533", "5370", "5370", "8930", "9343", "5370", "9499",
"7693", "7830", "5392", "5370", "5370", "5370", "7497", "8516",
"9023", "7310", "8939"), date = structure(c(12934, 13000, 13038,
13061, 13099, 13113, 13117, 13179, 13238, 13249, 13268, 13296,
13299, 13309, 13314, 13391, 13400, 13404, 13409, 13428, 13452,
13452, 13460, 13482, 13493, 13518, 13526, 13537, 13542, 13544,
13596, 13616, 13617, 13626, 13633, 13633, 13639, 13642, 13646,
13656, 13660, 13664, 13667, 13669, 13677, 13686, 13694, 13694,
13707, 13716, 13725, 13738, 13739, 13746, 13756, 13756, 13756,
13761, 13769, 13770, 13776, 13786, 13786, 13786, 13791, 13799,
13806, 13813, 13817, 13817, 13817, 13822, 13829, 13830, 13836,
13847, 13847, 13847, 13852, 13860, 13866, 13871, 13878, 13878,
13878, 13882, 13883, 13883, 13887, 13887, 13888, 13889, 13890,
13891, 13895, 13896, 13896, 13899, 13905, 13909), class = "Date"),
amount = c(24.4, 7618, 21971, 5245, 2921, 8000, 169.2, 71.5,
14.6, 4214, 14.6, 13920, 14.6, 24640, 1600, 261.1, 16400,
3500, 2700, 19882, 182, 14.6, 16927, 25653, 3059, 2880, 9658,
4500, 12480, 14.6, 1000, 3679, 34430, 12600, 14.6, 19.2,
4900, 826, 3679, 2100, 38000, 79, 11400, 21495, 3679, 200,
14.6, 100.6, 3679, 5300, 108.9, 3679, 2696, 7500, 171.6,
14.6, 99.2, 2452, 3679, 3218, 700, 69.7, 14.6, 91.5, 2452,
3679, 2900, 17572, 14.6, 14.6, 90.5, 2452, 49752, 3679, 1900,
14.6, 870, 85.2, 2452, 3679, 1600, 540, 14.6, 14.6, 79, 210,
2452, 28400, 720, 180, 420, 44289, 489, 3679, 840, 2900,
150, 870, 420, 14.6)), row.names = c(NA, -100L), class = "data.frame")
data$date_minus_180 <- data$date - 180
result <- Map(function(x, y) data[data$date >=y & data$date <= x,],
data$date, data$date_minus_180)
result <- lapply(data$date, function(x)
data[data$date >= (x-180) & data$date <= x,])