R 使用条件将数据帧一分为二
我有一个数据框R 使用条件将数据帧一分为二,r,R,我有一个数据框 mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol=
mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol= c(1,2,3,4,5,6,7))
mydata
MyTimes othercol
1: 2015-01-01 00:00:03 1
2: 2015-01-01 00:00:04 2
3: 2015-01-01 00:00:18 3
4: 2015-01-01 00:00:48 4
5: 2015-01-01 00:00:48 5
6: 2015-01-01 00:00:54 6
7: 2015-01-01 00:01:12 7
8: 2015-01-01 00:01:45 1
数据按时间排序,我想将此数据帧分为两个数据帧,有两个条件:
如果可能的话,断裂应该发生在中间,
data frame 1:
MyTimes othercol
2015-01-01 00:00:03 1
2015-01-01 00:00:04 2
2015-01-01 00:00:18 3
2015-01-01 00:00:48 4
2015-01-01 00:00:48 5
data frame 2:
2015-01-01 00:00:54 6
2015-01-01 00:01:12 7
2015-01-01 00:01:45 1
也可以是这样的:
data frame1:
2015-01-01 00:00:03 1
2015-01-01 00:00:04 2
2015-01-01 00:00:18 3
data frame2:
2015-01-01 00:00:48 4
2015-01-01 00:00:48 5
2015-01-01 00:00:54 6
2015-01-01 00:01:12 7
2015-01-01 00:01:45 1
无论哪种方式,00:00:48都在同一数据帧中,这是怎么回事
mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol= c(1,2,3,4,5,6,7))
mydata
MyTimes othercol
1: 2015-01-01 00:00:03 1
2: 2015-01-01 00:00:04 2
3: 2015-01-01 00:00:18 3
4: 2015-01-01 00:00:48 4
5: 2015-01-01 00:00:48 5
6: 2015-01-01 00:00:54 6
7: 2015-01-01 00:01:12 7
8: 2015-01-01 00:01:45 1
split(mydata, as.numeric(mydata$MyTimes) < median(as.numeric(mydata$MyTimes)))
$`FALSE`
MyTimes secondcol
1: 2015-01-01 00:00:48 4
2: 2015-01-01 00:00:48 5
3: 2015-01-01 00:00:54 6
4: 2015-01-01 00:01:12 7
5: 2015-01-01 00:01:45 8
$`TRUE`
MyTimes secondcol
1: 2015-01-01 00:00:03 1
2: 2015-01-01 00:00:04 2
3: 2015-01-01 00:00:18 3
split(mydata,as.numeric(mydata$MyTimes)
不像@DatamineR的解决方案那样优雅,但使用游程编码的替代方案是
library(data.table)
mydata[, grp := rleid(MyTimes)] ## put times into groups
split(mydata, mydata$grp >= ceiling(max(mydata$grp)/2))
$`FALSE`
MyTimes othercol grp
1: 2015-01-01 00:00:03 1 1
2: 2015-01-01 00:00:04 2 2
3: 2015-01-01 00:00:18 3 3
$`TRUE`
MyTimes othercol grp
1: 2015-01-01 00:00:48 4 4
2: 2015-01-01 00:00:48 5 4
3: 2015-01-01 00:00:54 6 5
4: 2015-01-01 00:01:12 7 6
5: 2015-01-01 00:01:45 8 7
您的示例数据生成警告。我正在研究一个
data.table
解决方案(library(data.table)setDT(mydata)[,grp:=rleid(MyTimes)]拆分(mydata,mydata$grp==天花板(max(mydata$grp)/2))
),它在逻辑上与您的类似(但您的更简单、更优雅:)@DatamineR我在.Call(“Crbindlist”,l,use.names,fill):“Crbindlist”未从当前名称空间(data.table)解析@user3022875我明天会看一看