R 将每日和定期数据合并到一个数据帧中

R 将每日和定期数据合并到一个数据帧中,r,datetime,time-series,panel-data,R,Datetime,Time Series,Panel Data,我试图构建一个面板数据框,该数据框由周期性和“连续”的每日数据组成,应相互分配,以便新数据框的每一行都有周期、周期性数据的值以及该周期内某一天的值和日期,数据看起来类似于: > dailycds Date CDS 1 30-06-2015 194 2 01-07-2015 195 3 02-07-2015 198 4 03-07-2015 198 5 04-07-2015 199 6 30-06-2016 165 7 01-07-2016 172 8 02

我试图构建一个面板数据框,该数据框由周期性和“连续”的每日数据组成,应相互分配,以便新数据框的每一行都有周期、周期性数据的值以及该周期内某一天的值和日期,数据看起来类似于:

> dailycds
         Date CDS
1  30-06-2015 194
2  01-07-2015 195
3  02-07-2015 198
4  03-07-2015 198
5  04-07-2015 199
6  30-06-2016 165
7  01-07-2016 172
8  02-07-2016 213
9  03-07-2016 123
10 04-07-2016 321


> periodicassets
  Period Assets
1 201506   1314
2 201606   2134
最后,我希望它看起来像这样:

  > df
Period       Date Assets CDS
1 201506 30-06-2015   1314 194
2 201506 01-07-2015   1314 195
3 201506 02-07-2015   1314 198
4 201506 03-07-2015   1314 198
5 201606 30-06-2016   2134 165
6 201606 01-07-2016   2134 172
7 201606 02-07-2016   2134 213
8 201606 03-07-2016   2134 123
因此,基本上,我们的想法是从日常数据中获取一定范围的行,并将它们分配(并合并)到定期数据中。不幸的是,我不能简单地通过提取日期的mm-yyyy部分来实现这一点,因为201506期间还包含7月到第三个期间的数据,而第四个期间与任何期间无关,应该删除,因为每个期间应该只包含特定的天数(在本例中为4)

下面是获得上述示例数据的代码:

dailycds = data.frame(Date = c("30-06-2015", "01-07-2015", "02-07-2015","03-07-2015","04-07-2015","30-06-2016", "01-07-2016", "02-07-2016","03-07-2016","04-07-2016"),
                      CDS = c(194, 195, 198,198,199,165,172,213,123,321))
dailycds

periodicassets = data.frame(Period = c("201506", "201606"),
                            Assets = c("1314","2134"))
periodicassets

df = data.frame(Period = c("201506", "201506", "201506", "201506", "201606", "201606", "201606", "201606"),
                Date = c("30-06-2015", "01-07-2015", "02-07-2015","03-07-2015", "30-06-2016", "01-07-2016", "02-07-2016", "03-07-2016"),
                Assets = c("1314", "1314", "1314", "1314", "2134", "2134", "2134", "2134"),
                CDS = c(194, 195, 198, 198, 165, 172, 213, 123))
背景和其他复杂因素 因此,正如在给定的解决方案中所建议的,我前面的示例非常具体,可能过于简化。因此,为了更接近我的问题,这里有一些额外的背景: 最终,定期数据指的是月末银行资产的持有量,我想为其分配月末前3天和月末后6天的每日CDS数据。因此,在专家组中,当然有多家银行,对于每一家银行,必须为其持有的资产分配(相同的)CDS数据。(例如,如果我有2家银行,我需要在月底前3天和月底后6天,我有(3+1+6)*2天。)正如评论中指出的,我的问题中总是指营业日/工作日,因为我的时间序列不包含任何假期等

为了解决这个问题,这里有一段只有一个句号的原文:

> periodicassets
            BankName Period     value 
  2             BPCE 201412 112189.50
  4  Credit Agricole 201412  81618.76

    Date                CDS
   <dttm>              <chr>
  1 2015-01-12             46.869
  2 2015-01-09 48.121000000000002
  3 2015-01-08 48.625999999999998
  4 2015-01-07 48.801000000000002
  5 2015-01-06 48.633000000000003
  6 2015-01-05 46.670999999999999
  7 2015-01-02 45.158000000000001
  8 2015-01-01              47.32
  9 2014-12-31 47.658000000000001
 10 2014-12-30 45.843000000000004
 11 2014-12-29 47.588999999999999
 12 2014-12-26 47.625999999999998
 13 2014-12-25 47.697000000000003
 14 2014-12-24 47.414999999999999
 15 2014-12-23 48.075000000000003
 16 2014-12-22 48.085999999999999
 17 2014-12-19 47.496000000000002
 18 2014-12-18 46.534999999999997
 19 2014-12-17 48.149000000000001
>周期性资产
银行名称期间值
2 BPCE 201412 112189.50
4农业信贷20141281618.76
日期光盘
1 2015-01-12             46.869
2 2015-01-09 48.121000000000002
3 2015-01-08 48.625999999999998
4 2015-01-07 48.801000000000002
5 2015-01-06 48.633000000000003
6 2015-01-05 46.670999999999999
7 2015-01-02 45.158000000000001
8 2015-01-01              47.32
9 2014-12-31 47.658000000000001
10 2014-12-30 45.843000000000004
11 2014-12-29 47.588999999999999
12 2014-12-26 47.625999999999998
13 2014-12-25 47.697000000000003
14 2014-12-24 47.414999999999999
15 2014-12-23 48.075000000000003
16 2014-12-22 48.085999999999999
17 2014-12-19 47.496000000000002
18 2014-12-18 46.534999999999997
19 2014-12-17 48.149000000000001
可在此处访问:,

在浏览论坛时,我发现了类似的问题,如: 及
但是,当第一个尝试聚合数据时,第二个已经拥有了我想要的格式(在object xtime中)。

看看这是否适合您

library(lubridate)
library(dplyr)
library(tidyr)

periodicassets <- periodicassets %>%
        mutate(Date = ymd(paste(Period, "01", sep = ""))) %>%
        select(-Period)


dailycds$Date <- dmy(dailycds$Date)

full_join(dailycds, periodicassets) %>% 
        arrange(Date) %>% fill(Assets, .direction = "down") %>%
        na.omit

这个问题的关键问题是如何将
期间
映射到
日期
。根据OP的说明,我了解到每个期间包括实际月份的最后一天加上下个月的前三天,总共4天

这可以通过一些日期算法和右连接来解决:

library(data.table)
result <- 
  # coerce to data.table
  setDT(dailycds)[
    # compute period by subtracting 3 days of date
    , Period := format(as.IDate(Date, "%d-%m-%Y") - 3L, "%Y%m")][
      # right join, dropping all rows from dailycds without matching period
      periodicassets, on = "Period"][
        # change column order to be in line with expected result df
      , setcolorder(.SD, names(df))]
result
根据请求,每个时段只有4行,结果与预期结果一致
df

all.equal(df, as.data.frame(result[, lapply(.SD, forcats::fct_drop)]))
必须删除未使用的级别才能通过
all.equal()的严格检查

警告 代码已经过测试,可以与提供的示例数据一起使用。对于连续的每日数据和定期数据,可能需要添加代码以删除不属于4天周期的天数


编辑:更真实的示例数据 OP已经更新了他的问题,并通过dropbox提供了更真实的样本数据。现在,
dailycds
包含每日数据(周末除外)。如上所述,这需要对相关日期的
dailycds
进行过滤

OP不清楚如何定义月交前后的天数。这里,我们假设月末前3天和月末后6天是指日历日,而不是营业日

编辑2:使用工作日而不是日历日期。 他使用的是营业日而不是日历日。规范的这一看似微小的更改对日期的选择方式产生了严重影响

现在,每个月的前6个条目以及该月最后一个交易日(ultimo)之前的最后3个条目和ultimo本身都会被挑选,这导致3+1+6=10个工作日需要挑选

请注意,结果数据集包含(3+1+6)*2个月*2个银行=40行

来自dropbox的数据 如果dropbox链接断开:

dailycds <- 
structure(list(Date = structure(c(1424649600, 1424390400, 1424304000, 
1424217600, 1424131200, 1424044800, 1423785600, 1423699200, 1423612800, 
1423526400, 1423440000, 1423180800, 1423094400, 1423008000, 1422921600, 
1422835200, 1422576000, 1422489600, 1422403200, 1422316800, 1422230400, 
1421971200, 1421884800, 1421798400, 1421712000, 1421625600, 1421366400, 
1421280000, 1421193600, 1421107200, 1421020800, 1420761600, 1420675200, 
1420588800, 1420502400, 1420416000, 1420156800, 1420070400, 1419984000, 
1419897600, 1419811200, 1419552000, 1419465600, 1419379200, 1419292800, 
1419206400, 1418947200, 1418860800, 1418774400, 1418688000, 1418601600, 
1418342400, 1418256000, 1418169600, 1418083200, 1417996800, 1417737600, 
1417651200, 1417564800, 1417478400, 1417392000, 1417132800, 1417046400, 
1416960000, 1416873600, 1416787200, 1416528000, 1416441600, 1416355200, 
1416268800, 1416182400, 1415923200, 1415836800, 1415750400, 1415664000, 
1415577600, 1415318400, 1415232000, 1415145600, 1415059200, 1414972800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), CDS = c("44.259", 
"44.555999999999997", "45.076999999999998", "44.951000000000001", 
"45.762", "45.573", "45.634999999999998", "45.956000000000003", 
"47.064", "47.51", "48.576999999999998", "47.265000000000001", 
"47.073999999999998", "46.634999999999998", "46.405000000000001", 
"47.567", "47.396000000000001", "48.448999999999998", "49.442", 
"49.502000000000002", "49.73", "50.917000000000002", "51.37", 
"52.536999999999999", "49.188000000000002", "47.893999999999998", 
"46.728000000000002", "46.634999999999998", "46.366999999999997", 
"47.012999999999998", "46.869", "48.121000000000002", "48.625999999999998", 
"48.801000000000002", "48.633000000000003", "46.670999999999999", 
"45.158000000000001", "47.32", "47.658000000000001", "45.843000000000004", 
"47.588999999999999", "47.625999999999998", "47.697000000000003", 
"47.414999999999999", "48.075000000000003", "48.085999999999999", 
"47.496000000000002", "46.534999999999997", "48.149000000000001", 
"49.421999999999997", "48.223999999999997", "47.100999999999999", 
"47.484999999999999", "47.491999999999997", "47.052", "46.697000000000003", 
"44.670999999999999", "47.706000000000003", "46.835000000000001", 
"48.66", "46.841999999999999", "48.069000000000003", "49.49", 
"50.155000000000001", "50.155000000000001", "50.49", "52.024000000000001", 
"50.33", "50", "50.67", "53.15", "52.994999999999997", "55.31", 
"50.82", "50.49", "50.832999999999998", "52.241", "51.97", "52.8", 
"50.667000000000002", "51.134999999999998")), .Names = c("Date", 
"CDS"), row.names = c(NA, -81L), class = c("tbl_df", "tbl", "data.frame"))

periodicassets <- 
structure(list(BankName = c(" BPCE", " BPCE", " Credit Agricole", 
" Credit Agricole"), Period = c("201412", "201501", "201412", 
"201501"), value = c(112189.50293406, 103142.064337463, 81618.762099507, 
73987.36251389)), .Names = c("BankName", "Period", "value"), row.names = c(10L, 
11L, 18L, 19L), class = "data.frame")

dailycds你看过lubridate吗?不,谢谢,我没有看过。关于我的问题,你有什么想法吗?我不清楚你想要的结果。对于同一时期,例如
201506
,为什么
资产可以是1314或2134?嘿,ycw你的权利,我在手动生成所需输出时出错,现在应该清楚了:201506的资产值是1314,201606的资产值是2134。感谢您指出。在更新中,您写道我希望在月底前3天和月底后6天分配每日CD数据。你的意思是3天和6天,分别是日历日还是工作日/工作日?被否决的人请告诉我我的答案有什么问题……这会有帮助的。请不要使用JavaScript/HTML/CSS代码片段按钮(或Ctrl-M)来编写R代码。您可以通过标记代码并按ctrl-K来设置R代码的格式。谢谢。@如果我无法使用ctrl+K从Rstudio中键入或复制反对票,我想可能是因为(1)您的答案不正确
all.equal(df, as.data.frame(result[, lapply(.SD, forcats::fct_drop)]))
[1] TRUE
# define day range of interest relativ to turn of the month
days_before <- 3L
days_after  <- 6L
stopifnot(days_before + days_after < 28)

# read data from dropbox links, note ?dl=1 
dailycds <- readRDS(url("https://www.dropbox.com/s/r7v5dq6la0mnn71/dailycds.RDS?dl=1"))
periodicassets <-
  readRDS(url("https://www.dropbox.com/s/gdflcngwp8nm552/periodicassets.RDS?dl=1"))

library(data.table)
# coerce to data.table
setDT(dailycds)[
  # filter calendar dates
  mday(Date) <= days_after | mday(Date) > lubridate::days_in_month(Date) - days_before][
    # compute period by shifting dates from next month into actual month
    # coersion to IDate is required because Date is of class POSIXct 
    , Period := format(as.IDate(Date) - days_after, "%Y%m")][
      # right join, dropping all rows from dailycds without matching period
      setDT(periodicassets), on = "Period"][]
          Date                CDS Period         BankName     value
 1: 2015-01-06 48.633000000000003 201412             BPCE 112189.50
 2: 2015-01-05 46.670999999999999 201412             BPCE 112189.50
 3: 2015-01-02 45.158000000000001 201412             BPCE 112189.50
 4: 2015-01-01              47.32 201412             BPCE 112189.50
 5: 2014-12-31 47.658000000000001 201412             BPCE 112189.50
 6: 2014-12-30 45.843000000000004 201412             BPCE 112189.50
 7: 2014-12-29 47.588999999999999 201412             BPCE 112189.50
 8: 2015-02-06 47.265000000000001 201501             BPCE 103142.06
 9: 2015-02-05 47.073999999999998 201501             BPCE 103142.06
10: 2015-02-04 46.634999999999998 201501             BPCE 103142.06
11: 2015-02-03 46.405000000000001 201501             BPCE 103142.06
12: 2015-02-02             47.567 201501             BPCE 103142.06
13: 2015-01-30 47.396000000000001 201501             BPCE 103142.06
14: 2015-01-29 48.448999999999998 201501             BPCE 103142.06
15: 2015-01-06 48.633000000000003 201412  Credit Agricole  81618.76
16: 2015-01-05 46.670999999999999 201412  Credit Agricole  81618.76
...
26: 2015-02-02             47.567 201501  Credit Agricole  73987.36
27: 2015-01-30 47.396000000000001 201501  Credit Agricole  73987.36
28: 2015-01-29 48.448999999999998 201501  Credit Agricole  73987.36
          Date                CDS Period         BankName     value
# define range of business days relative to the last trading day (ultimo)
days_before <- 3L
days_after  <- 6L
stopifnot(days_before + days_after < 28)

library(data.table)
# read data from dropbox links, note ?dl=1 
dailycds <- readRDS(url("https://www.dropbox.com/s/r7v5dq6la0mnn71/dailycds.RDS?dl=1"))
periodicassets <- readRDS(url("https://www.dropbox.com/s/gdflcngwp8nm552/periodicassets.RDS?dl=1"))
# coerce to data.table
setDT(dailycds)[
  # filter business dates: 
  # for each month pick the first days_after business days into the month 
  # and the last days_before biz days before and including ultimo
  dailycds[, c(head(.I, days_after), tail(.I, days_before + 1L)), 
           by = .(year(Date), month(Date))]$V1][
    # compute period by shifting dates from next month into actual month
    # coersion to IDate is required because Date is of class POSIXct 
    , Period := format(as.IDate(Date) - days_after, "%Y%m")][
      # right join, dropping all rows from dailycds without matching period
      setDT(periodicassets), on = "Period"][]
          Date                CDS Period         BankName     value
 1: 2015-01-06 48.633000000000003 201412             BPCE 112189.50
 2: 2015-01-05 46.670999999999999 201412             BPCE 112189.50
 3: 2015-01-02 45.158000000000001 201412             BPCE 112189.50
 4: 2015-01-01              47.32 201412             BPCE 112189.50
 5: 2014-12-31 47.658000000000001 201412             BPCE 112189.50
 6: 2014-12-30 45.843000000000004 201412             BPCE 112189.50
 7: 2014-12-29 47.588999999999999 201412             BPCE 112189.50
 8: 2014-12-26 47.625999999999998 201412             BPCE 112189.50
 9: 2014-12-25 47.697000000000003 201412             BPCE 112189.50
10: 2014-12-24 47.414999999999999 201412             BPCE 112189.50
11: 2015-02-05 47.073999999999998 201501             BPCE 103142.06
12: 2015-02-04 46.634999999999998 201501             BPCE 103142.06
13: 2015-02-03 46.405000000000001 201501             BPCE 103142.06
14: 2015-02-02             47.567 201501             BPCE 103142.06
15: 2015-01-30 47.396000000000001 201501             BPCE 103142.06
16: 2015-01-29 48.448999999999998 201501             BPCE 103142.06
17: 2015-01-28             49.442 201501             BPCE 103142.06
18: 2015-01-27 49.502000000000002 201501             BPCE 103142.06
19: 2015-01-26              49.73 201501             BPCE 103142.06
20: 2015-01-23 50.917000000000002 201501             BPCE 103142.06
21: 2015-01-06 48.633000000000003 201412  Credit Agricole  81618.76
22: 2015-01-05 46.670999999999999 201412  Credit Agricole  81618.76
...
39: 2015-01-26              49.73 201501  Credit Agricole  73987.36
40: 2015-01-23 50.917000000000002 201501  Credit Agricole  73987.36
          Date                CDS Period         BankName     value
dailycds <- 
structure(list(Date = structure(c(1424649600, 1424390400, 1424304000, 
1424217600, 1424131200, 1424044800, 1423785600, 1423699200, 1423612800, 
1423526400, 1423440000, 1423180800, 1423094400, 1423008000, 1422921600, 
1422835200, 1422576000, 1422489600, 1422403200, 1422316800, 1422230400, 
1421971200, 1421884800, 1421798400, 1421712000, 1421625600, 1421366400, 
1421280000, 1421193600, 1421107200, 1421020800, 1420761600, 1420675200, 
1420588800, 1420502400, 1420416000, 1420156800, 1420070400, 1419984000, 
1419897600, 1419811200, 1419552000, 1419465600, 1419379200, 1419292800, 
1419206400, 1418947200, 1418860800, 1418774400, 1418688000, 1418601600, 
1418342400, 1418256000, 1418169600, 1418083200, 1417996800, 1417737600, 
1417651200, 1417564800, 1417478400, 1417392000, 1417132800, 1417046400, 
1416960000, 1416873600, 1416787200, 1416528000, 1416441600, 1416355200, 
1416268800, 1416182400, 1415923200, 1415836800, 1415750400, 1415664000, 
1415577600, 1415318400, 1415232000, 1415145600, 1415059200, 1414972800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), CDS = c("44.259", 
"44.555999999999997", "45.076999999999998", "44.951000000000001", 
"45.762", "45.573", "45.634999999999998", "45.956000000000003", 
"47.064", "47.51", "48.576999999999998", "47.265000000000001", 
"47.073999999999998", "46.634999999999998", "46.405000000000001", 
"47.567", "47.396000000000001", "48.448999999999998", "49.442", 
"49.502000000000002", "49.73", "50.917000000000002", "51.37", 
"52.536999999999999", "49.188000000000002", "47.893999999999998", 
"46.728000000000002", "46.634999999999998", "46.366999999999997", 
"47.012999999999998", "46.869", "48.121000000000002", "48.625999999999998", 
"48.801000000000002", "48.633000000000003", "46.670999999999999", 
"45.158000000000001", "47.32", "47.658000000000001", "45.843000000000004", 
"47.588999999999999", "47.625999999999998", "47.697000000000003", 
"47.414999999999999", "48.075000000000003", "48.085999999999999", 
"47.496000000000002", "46.534999999999997", "48.149000000000001", 
"49.421999999999997", "48.223999999999997", "47.100999999999999", 
"47.484999999999999", "47.491999999999997", "47.052", "46.697000000000003", 
"44.670999999999999", "47.706000000000003", "46.835000000000001", 
"48.66", "46.841999999999999", "48.069000000000003", "49.49", 
"50.155000000000001", "50.155000000000001", "50.49", "52.024000000000001", 
"50.33", "50", "50.67", "53.15", "52.994999999999997", "55.31", 
"50.82", "50.49", "50.832999999999998", "52.241", "51.97", "52.8", 
"50.667000000000002", "51.134999999999998")), .Names = c("Date", 
"CDS"), row.names = c(NA, -81L), class = c("tbl_df", "tbl", "data.frame"))

periodicassets <- 
structure(list(BankName = c(" BPCE", " BPCE", " Credit Agricole", 
" Credit Agricole"), Period = c("201412", "201501", "201412", 
"201501"), value = c(112189.50293406, 103142.064337463, 81618.762099507, 
73987.36251389)), .Names = c("BankName", "Period", "value"), row.names = c(10L, 
11L, 18L, 19L), class = "data.frame")