R 使用相同的引用比较组内的日期
我有一个针对不同患者(“拼写”)的数据表,以及针对每个患者(“发作”)的几个温度(“温度”)测量值。我还有每个温度测量的日期和时间R 使用相同的引用比较组内的日期,r,datetime,dataframe,compare,R,Datetime,Dataframe,Compare,我有一个针对不同患者(“拼写”)的数据表,以及针对每个患者(“发作”)的几个温度(“温度”)测量值。我还有每个温度测量的日期和时间 Spell Episode Date Temp 1 3 2-1-17 21:00 40 1 2 2-1-17 20:00 36 1 1 1-1-17 10:00 37 2 3 2-1-17 15:00 36 2
Spell Episode Date Temp
1 3 2-1-17 21:00 40
1 2 2-1-17 20:00 36
1 1 1-1-17 10:00 37
2 3 2-1-17 15:00 36
2 2 2-1-17 10:00 37
2 1 1-1-17 8:00 36
3 1 3-1-17 10:00 40
4 3 4-1-17 15:00 36
4 2 3-1-17 12:00 40
4 1 3-1-17 10:00 39
5 7 3-1-17 17:30 36
5 6 2-1-17 17:00 36
5 5 2-1-17 16:00 37
5 1 1-1-17 9:00 36
5 4 1-1-17 14:00 39
5 3 1-1-17 13:00 40
5 2 1-1-17 11:00 39
我有兴趣在最后一次测量前24小时进行所有测量,我已经按照拼写和反转日期对观察结果进行分组,但我不确定如何使用相同的参考进行组内比较(在本例中,每组的第一行)。结果应该是:
Spell Episode Date Temp
1 3 2-1-17 21:00 40
1 2 2-1-17 20:00 36
2 3 2-1-17 15:00 36
2 2 2-1-17 10:00 37
3 1 3-1-17 10:00 40
4 3 4-1-17 15:00 36
5 7 3-1-17 17:30 36
如果能给我指出正确的方向,我将不胜感激
编辑:日期为d-m-yy H:m格式。以下是来自数据的dput:
structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA,
-17L), .internal.selfref = <pointer: 0x00000000001f0788>)
结构(拼写=c(1L,1L,1L,2L,2L,2L,3L,4L,4L,
4L,5L,5L,5L,5L,5L,5L,5L),插曲=c(3L,2L,1L,3L,
2L,1L,1L,3L,2L,1L,7L,6L,5L,1L,4L,3L,2L,日期=c(“2-1-17 21:00”,
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
),温度=c(40L,36L,37L,36L,37L,36L,40L,36L,40L,36L,40L,39L,
36L,36L,37L,36L,39L,40L,39L)),名称=c(“拼写”,“插曲”,
“日期”,“临时”),class=c(“数据表”,“数据框”),row.names=c(NA,
-17L),.internal.selfref=)
mydata$Date仅使用数据的解决方案。表
:
# convert Date column to POSIXct
DT[,Date:=as.POSIXct(Date,format='%d-%m-%y %H:%M',tz='GMT')]
# filter the data.table
filteredDT <- DT[, .SD[as.numeric(difftime(max(Date),Date,units='hours')) <= 24], by = Spell]
> filteredDT
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
3: 2 3 2017-01-02 15:00:00 36
4: 2 2 2017-01-02 10:00:00 37
5: 3 1 2017-01-03 10:00:00 40
6: 4 3 2017-01-04 15:00:00 36
7: 5 7 2017-01-03 17:30:00 36
#将日期列转换为POSIXct
DT[,日期:=as.POSIXct(日期,格式=“%d-%m-%y%H:%m',tz='GMT')]
#筛选数据表
filteredDT下面的解决方案使用了Hadley Wickham的lubridate()软件包中的两个函数。这个软件包在处理日期和时间时非常方便,所以我想知道为什么它没有在其他任何答案中使用
此外,之所以使用data.table
,是因为OP提供了data.table
类的样本数据
library(data.table) # if not already loaded
# coerce Date to POSIXct
DT[, Date := lubridate::dmy_hm(Date)][
# for each, pick measurements within last 24 hours
, .SD[Date > max(Date) - lubridate::dhours(24L)], by = Spell][
# order, just for convenience
order(Spell, -Date)]
请注意,OP给出的预期结果显示了24小时窗口之外的另一行(拼写5,第6集)
资料
由OP提供
DT <- structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, -17L))
DT可复制的示例对这一个非常有用。日期的格式是什么?谢谢,日期格式是d-m-yy,我编辑了添加dput结果。您的预期结果显示了24小时窗口之外的另一行(拼写5,第6集)。这是故意的吗?@UweBlock,一点也不-这是个错误,现在编辑它。谢谢你指出。你为什么不使用THOP提供的数据呢?相反,您提供的是您自己的数据,Date
列已转换为classPOSIXct
?在我回答问题后,OP在编辑中添加了数据样本。我将相应地编辑我的答案。
library(data.table) # if not already loaded
# coerce Date to POSIXct
DT[, Date := lubridate::dmy_hm(Date)][
# for each, pick measurements within last 24 hours
, .SD[Date > max(Date) - lubridate::dhours(24L)], by = Spell][
# order, just for convenience
order(Spell, -Date)]
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
3: 2 3 2017-01-02 15:00:00 36
4: 2 2 2017-01-02 10:00:00 37
5: 3 1 2017-01-03 10:00:00 40
6: 4 3 2017-01-04 15:00:00 36
7: 5 7 2017-01-03 17:30:00 36
DT <- structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, -17L))