基于R中两种不同评估方法的汇总数据

基于R中两种不同评估方法的汇总数据,r,datetime,aggregate,tidyverse,data-cleaning,R,Datetime,Aggregate,Tidyverse,Data Cleaning,我想收集一些计步器数据,以每分钟步数为单位,这样我就可以得到一个总的步数,直到EMA评估。EMA评估每天进行四次。这两个数据集的示例如下: 计步器数据 ID Steps Time 1 15 2/4/2020 8:32 1 23 2/4/2020 8:33 1 76 2/4/2020 8:34 1 32 2/4/2020 8:35 1 45 2/4/2020 8:36 ... 2 16 2/4/2020 8:32 2 17 2/4/

我想收集一些计步器数据,以每分钟步数为单位,这样我就可以得到一个总的步数,直到EMA评估。EMA评估每天进行四次。这两个数据集的示例如下:

计步器数据

ID Steps      Time
1   15   2/4/2020 8:32
1   23   2/4/2020 8:33
1   76   2/4/2020 8:34
1   32   2/4/2020 8:35
1   45   2/4/2020 8:36
...
2   16   2/4/2020 8:32
2   17   2/4/2020 8:33
2   0    2/4/2020 8:34
2   5    2/4/2020 8:35
2   8    2/4/2020 8:36
ID      Time      X Y
1  2/4/2020 8:36  3 4
1  2/4/2020 12:01 3 5
1  2/4/2020 3:30  4 5
1  2/4/2020 6:45  7 8
...
2  2/4/2020 8:35  4 6
2  2/4/2020 12:05 5 7
2  2/4/2020 3:39  1 3
2  2/4/2020 6:55  8 3
ID      Time      X Y Steps
1  2/4/2020 8:36  3 4 191
1  2/4/2020 12:01 3 5 [Sum of steps taken from 8:37 until 12:01 on 2/4/2020]
1  2/4/2020 3:30  4 5 [Sum of steps taken from 12:02 until 3:30 on 2/4/2020]
1  2/4/2020 6:45  7 8 [Sum of steps taken from 3:31 until 6:45 on 2/4/2020]
...
2  2/4/2020 8:35  4 6 38
2  2/4/2020 12:05 5 7 [Sum of steps taken from 8:36 until 12:05 on 2/4/2020]
2  2/4/2020 3:39  1 3 [Sum of steps taken from 12:06 until 3:39 on 2/4/2020]
2  2/4/2020 6:55  8 3 [Sum of steps taken from 3:40 until 6:55 on 2/4/2020]
均线数据

ID Steps      Time
1   15   2/4/2020 8:32
1   23   2/4/2020 8:33
1   76   2/4/2020 8:34
1   32   2/4/2020 8:35
1   45   2/4/2020 8:36
...
2   16   2/4/2020 8:32
2   17   2/4/2020 8:33
2   0    2/4/2020 8:34
2   5    2/4/2020 8:35
2   8    2/4/2020 8:36
ID      Time      X Y
1  2/4/2020 8:36  3 4
1  2/4/2020 12:01 3 5
1  2/4/2020 3:30  4 5
1  2/4/2020 6:45  7 8
...
2  2/4/2020 8:35  4 6
2  2/4/2020 12:05 5 7
2  2/4/2020 3:39  1 3
2  2/4/2020 6:55  8 3
ID      Time      X Y Steps
1  2/4/2020 8:36  3 4 191
1  2/4/2020 12:01 3 5 [Sum of steps taken from 8:37 until 12:01 on 2/4/2020]
1  2/4/2020 3:30  4 5 [Sum of steps taken from 12:02 until 3:30 on 2/4/2020]
1  2/4/2020 6:45  7 8 [Sum of steps taken from 3:31 until 6:45 on 2/4/2020]
...
2  2/4/2020 8:35  4 6 38
2  2/4/2020 12:05 5 7 [Sum of steps taken from 8:36 until 12:05 on 2/4/2020]
2  2/4/2020 3:39  1 3 [Sum of steps taken from 12:06 until 3:39 on 2/4/2020]
2  2/4/2020 6:55  8 3 [Sum of steps taken from 3:40 until 6:55 on 2/4/2020]
我希望将计步器数据作为一个新变量添加到EMA数据中,在下一次EMA评估之前,将所采取的步骤数相加。理想情况下,它会像这样:

组合数据

ID Steps      Time
1   15   2/4/2020 8:32
1   23   2/4/2020 8:33
1   76   2/4/2020 8:34
1   32   2/4/2020 8:35
1   45   2/4/2020 8:36
...
2   16   2/4/2020 8:32
2   17   2/4/2020 8:33
2   0    2/4/2020 8:34
2   5    2/4/2020 8:35
2   8    2/4/2020 8:36
ID      Time      X Y
1  2/4/2020 8:36  3 4
1  2/4/2020 12:01 3 5
1  2/4/2020 3:30  4 5
1  2/4/2020 6:45  7 8
...
2  2/4/2020 8:35  4 6
2  2/4/2020 12:05 5 7
2  2/4/2020 3:39  1 3
2  2/4/2020 6:55  8 3
ID      Time      X Y Steps
1  2/4/2020 8:36  3 4 191
1  2/4/2020 12:01 3 5 [Sum of steps taken from 8:37 until 12:01 on 2/4/2020]
1  2/4/2020 3:30  4 5 [Sum of steps taken from 12:02 until 3:30 on 2/4/2020]
1  2/4/2020 6:45  7 8 [Sum of steps taken from 3:31 until 6:45 on 2/4/2020]
...
2  2/4/2020 8:35  4 6 38
2  2/4/2020 12:05 5 7 [Sum of steps taken from 8:36 until 12:05 on 2/4/2020]
2  2/4/2020 3:39  1 3 [Sum of steps taken from 12:06 until 3:39 on 2/4/2020]
2  2/4/2020 6:55  8 3 [Sum of steps taken from 3:40 until 6:55 on 2/4/2020]
然后,我需要在整个21天的EMA期间继续该过程,因此,对于2020年2月5日、2020年2月6日等4个EMA评估时间点,也需要同样的过程


这已经把我的R技能推到了极限,所以任何指针都会非常有用!我对tidyverse最为熟悉,但使用base R也很舒服。提前感谢您的建议。

我将通过
ID
Time
计步器上加入
ema\u df
。这样你就可以 当不是EMA评估时间时,
pedometer_df
的所有行都缺少
x
y
(我假设是标识符)

我使用下一个可用值填充值(因此下一个ema评估
x
y
) 最后,
根据
ID
x
y
总结
对U进行分组,以保留评估日期时间(最大)和步骤总和

library(dplyr)
library(tidyr)

pedometer_df %>%
  left_join(ema_df, by = c("ID", "Time")) %>%
  fill(x, y, .direction = "up") %>%
  group_by(ID, x, y) %>%
  summarise(
    Time = max(Time),
    Steps = sum(Steps)
  )

下面是一个使用
data.table
中的滚动联接的解决方案。这里的基本思想是每次从
计步器
数据滚动到
EMA
数据中的下一次(同时仍然匹配ID)。一旦找到下一个EMA时间,剩下的就是分离
X
Y
值并汇总
步骤

数据创建和准备:

库(data.table)
计步器3:1202-02-0417:00:0036740
#>  4:  2 2020-02-04 17:00:00 4 6   747
#>  5:  1 2020-02-04 23:00:00 2 2   679
#>  6:  2 2020-02-04 23:00:00 3 2   732
#>  7:  1 2020-02-05 05:00:00 7 5   720
#>  8:  2 2020-02-05 05:00:00 6 8   692
#>  9:  1 2020-02-05 11:00:00 2 4   731
#> 10:  2 2020-02-05 11:00:00 4 5   773
#> 11:  1 2020-02-05 17:00:00 1 5   757
#> 12:  2 2020-02-05 17:00:00 3 5   743
#> 13:  1 2020-02-05 23:00:00 3 8   693
#> 14:  2 2020-02-05 23:00:00 1 8   740
#> 15:  1 2020-02-06 05:00:00 8 8   710
#> 16:  2 2020-02-06 05:00:00 3 2   760
#> 17:  1 2020-02-06 11:00:00 8 4   716
#> 18:  2 2020-02-06 11:00:00 1 2   688
#> 19:  1 2020-02-06 17:00:00 5 2   738
#> 20:  2 2020-02-06 17:00:00 4 6   724
#> 21:  1 2020-02-06 23:00:00 7 8   737
#> 22:  2 2020-02-06 23:00:00 6 3   672
#> 23:  1 2020-02-07 05:00:00 2 6   726
#> 24:  2 2020-02-07 05:00:00 7 7   759
#> 25:  1 2020-02-07 11:00:00 1 4   737
#> 26:  2 2020-02-07 11:00:00 5 2   737
#> 27:  1 2020-02-07 17:00:00 3 5   766
#> 28:  2 2020-02-07 17:00:00 4 4   745
#> 29:  1 2020-02-07 23:00:00 3 3   714
#> 30:  2 2020-02-07 23:00:00 2 1   741
#> 31:  1 2020-02-08 05:00:00 4 6   751
#> 32:  2 2020-02-08 05:00:00 8 2   723
#> 33:  1 2020-02-08 11:00:00 3 3   716
#> 34:  2 2020-02-08 11:00:00 3 6   735
#> 35:  1 2020-02-08 17:00:00 1 5   696
#> 36:  2 2020-02-08 17:00:00 7 7   741
#>ID下一步\u ema\u时间X Y步骤
由(v0.3.0)于2020年2月4日创建