R数据帧-跨时间序列应用表达式,并在新数据帧中输出结果

R数据帧-跨时间序列应用表达式,并在新数据帧中输出结果,r,dataframe,R,Dataframe,我正在学习R,遇到了一个我无法克服/找到答案的问题 我有一个数据帧 ID=c("a1","a1","a1","a1", "a2","a2","a2","a2", "a3","a3","a3","a3", &quo

我正在学习R,遇到了一个我无法克服/找到答案的问题

我有一个数据帧

  ID=c("a1","a1","a1","a1", 
       "a2","a2","a2","a2",
       "a3","a3","a3","a3",
       "b1","b1","b1","b1", 
       "b2","b2","b2","b2",
       "b3","b3","b3","b3"), 
  Date=c("January-19", "February-19", "March-19", "April-19", 
         "January-19", "February-19", "March-19", "April-19",
         "January-19", "February-19", "March-19", "April-19", 
         "January-19", "February-19", "March-19", "April-19", 
         "January-19", "February-19", "March-19", "April-19", 
         "January-19", "February-19", "March-19", "April-19", 
         "May-19", "June-19", "July-19", "August-19", 
         "May-19", "June-19", "July-19", "August-19",
         "May-19", "June-19", "July-19", "August-19", 
         "May-19", "June-19", "July-19", "August-19",
         "May-19", "June-19", "July-19", "August-19", 
         "May-19", "June-19", "July-19", "August-19"), 
  Value=c(1,2,5,4,7,3,9,8,9,10,44,3,15,16,17,2, 3, 22, 12, 3, 4, 44, 24, 5))
“ID”列为“字符”,“日期”列为“日期”,而“值”列为“数字”

基于这个dataframe(df),我试图创建一个新的dataframe,它将在一列中显示表达式的结果,以及它在另一列中引用的日期

例如,对于“df”中的给定日期,我希望找到给定表达式(a1+b1)/b1'的“值”,并将结果放入新的数据框中,显示其所指日期期间的单个值,并应用于整个“日期”时间序列

使用“df”值和示例表达式,新的数据帧如下所示:

January-19  | 1.06
February-19 | 1.13
March-19    | 1.29 
April-19    | 3
May-19      | 1.06
June-19     | 1.13
July-19     | 1.29
这些表达式比给出的示例要复杂得多,但我不确定这是否重要,因为我试图找出的是如何应用任何计算并根据新数据帧中的一系列日期输出它,而不考虑复杂性


如果这是一个简单的问题,请道歉,并提前向您表示感谢。

实现这一点有多种方法

您的示例数据的日期比ID/值多,所以我重新编写了一点

以下两种方法都假设每个日期只有一个a1/b1

初始设置 输出

  Date        Value_cal
* <chr>           <dbl>
1 April-19         3   
2 February-19      1.12
3 January-19       1.07
4 March-19         1.29
# A tibble: 4 x 4
  Date           a1    b1 Value_cal
  <chr>       <dbl> <dbl>     <dbl>
1 January-19      1    15      1.07
2 February-19     2    16      1.12
3 March-19        5    17      1.29
4 April-19        4     2      3  
输出

  Date        Value_cal
* <chr>           <dbl>
1 April-19         3   
2 February-19      1.12
3 January-19       1.07
4 March-19         1.29
# A tibble: 4 x 4
  Date           a1    b1 Value_cal
  <chr>       <dbl> <dbl>     <dbl>
1 January-19      1    15      1.07
2 February-19     2    16      1.12
3 March-19        5    17      1.29
4 April-19        4     2      3  
#一个tible:4 x 4
日期a1 b1值
1月1日至19日1151.07
2月2日至19日2161.12
3月3日至19日5 17 1.29
4月4日至19日4 2 3

这里有一个基本R解决方案,适用于所有ID集。这也假设条目之间是对称的

重要的一步是将数据调整到正确的顺序。后续步骤仅处理条目

使用这种方法的好处是可扩展的执行时间、对数据的最大控制以及包独立性(这是个人的偏好)

数据:


非常感谢你,这正是我所期望的。。我接受了这个答案。出于好奇,是否有理由将演示数据存储为TIBLE而不是数据帧?我只是个人喜好<代码>TIBLE在控制台中打印出比默认data.frame更好的数据。除此之外,与此示例相关的是,没有技术理由这样做。
# A tibble: 4 x 4
  Date           a1    b1 Value_cal
  <chr>       <dbl> <dbl>     <dbl>
1 January-19      1    15      1.07
2 February-19     2    16      1.12
3 March-19        5    17      1.29
4 April-19        4     2      3  
df <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L), class = "factor", .Label = c("a1",
"a2", "a3", "b1", "b2", "b3")), Date = structure(c(4L, 3L, 7L,
1L, 4L, 3L, 7L, 1L, 4L, 3L, 7L, 1L, 4L, 3L, 7L, 1L, 4L, 3L, 7L,
1L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L, 8L, 6L, 5L, 2L, 8L, 6L, 5L,
2L, 8L, 6L, 5L, 2L, 8L, 6L, 5L, 2L, 8L, 6L, 5L, 2L), .Label = c("April-19",
"August-19", "February-19", "January-19", "July-19", "June-19",
"March-19", "May-19"), class = "factor"), Value = c(1, 2, 5,
4, 7, 3, 9, 8, 9, 10, 44, 3, 15, 16, 17, 2, 3, 22, 12, 3, 4,
44, 24, 5, 1, 2, 5, 4, 7, 3, 9, 8, 9, 10, 44, 3, 15, 16, 17,
2, 3, 22, 12, 3, 4, 44, 24, 5)), class = "data.frame", row.names = c(NA,
-48L))
df_reo <- df[ order( matrix( unlist( strsplit( as.character(df$ID), "" ) ),
                             ncol=2, byrow=T )[,2],
                     as.Date(df$Date, "%b-%d") ), ]
li <- matrix( 1:nrow(df_reo), ncol=2, byrow=T ) # helper ids for the rows
colnames(li) <- c("a","b")

ds <- as.numeric( unlist(strsplit(sort(as.character( df$ID )), "" )[nrow(df)])[2] ) # ID-sets, only for nicer formatting
df_fin <- matrix( vapply( 1:nrow(li), function(x){
                        ( df_reo$Value[li[x,"a"]] + df_reo$Value[li[x,"b"]] ) / 
                          df_reo$Value[li[x,"b"]] }, 1.0 ), ncol=ds ) 

rownames(df_fin) <- unique(df_reo$Date)
> data.frame( df_fin )
                  X1       X2       X3
January-19  1.066667 3.333333 3.250000
February-19 1.125000 1.136364 1.227273
March-19    1.294118 1.750000 2.833333
April-19    3.000000 3.666667 1.600000
May-19      1.066667 3.333333 3.250000
June-19     1.125000 1.136364 1.227273
July-19     1.294118 1.750000 2.833333
August-19   3.000000 3.666667 1.600000