R 使用条件将数据帧一分为二

R 使用条件将数据帧一分为二,r,R,我有一个数据框 mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol=

我有一个数据框

mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol= c(1,2,3,4,5,6,7))


 mydata
               MyTimes othercol
1: 2015-01-01 00:00:03        1
2: 2015-01-01 00:00:04        2
3: 2015-01-01 00:00:18        3
4: 2015-01-01 00:00:48        4
5: 2015-01-01 00:00:48        5
6: 2015-01-01 00:00:54        6
7: 2015-01-01 00:01:12        7
8: 2015-01-01 00:01:45        1
数据按时间排序,我想将此数据帧分为两个数据帧,有两个条件:

如果可能的话,断裂应该发生在中间,
  • 但是,在相同的秒数下接近中断的时间必须在相同的数据帧中
  • 在这个例子中有8行,我想在中间分解它。每行4行,但注意00:00:48将在两个数据帧中,根据上面的第2点,这是不可能的。也就是说,当你突破时,你不能在同一秒内突破

    所以这里的数据帧可能是

    data frame 1:
                       MyTimes othercol
         2015-01-01 00:00:03        1
         2015-01-01 00:00:04        2
         2015-01-01 00:00:18        3
         2015-01-01 00:00:48        4
         2015-01-01 00:00:48        5
    
    data frame 2:
         2015-01-01 00:00:54        6
         2015-01-01 00:01:12        7
         2015-01-01 00:01:45        1
    
    也可以是这样的:

    data frame1:
       2015-01-01 00:00:03        1
       2015-01-01 00:00:04        2
       2015-01-01 00:00:18        3
    
    data frame2:
        2015-01-01 00:00:48        4
        2015-01-01 00:00:48        5
        2015-01-01 00:00:54        6
        2015-01-01 00:01:12        7
        2015-01-01 00:01:45        1
    
    无论哪种方式,00:00:48都在同一数据帧中,这是怎么回事

    mydata = data.table(MyTimes = as.POSIXct(c("2015-01-01 00:00:03","2015-01-01 00:00:04","2015-01-01 00:00:18","2015-01-01 00:00:48","2015-01-01 00:00:48","2015-01-01 00:00:54","2015-01-01 00:01:12","2015-01-01 00:01:45"),tz = "GMT"),othercol= c(1,2,3,4,5,6,7))
    
    
     mydata
                   MyTimes othercol
    1: 2015-01-01 00:00:03        1
    2: 2015-01-01 00:00:04        2
    3: 2015-01-01 00:00:18        3
    4: 2015-01-01 00:00:48        4
    5: 2015-01-01 00:00:48        5
    6: 2015-01-01 00:00:54        6
    7: 2015-01-01 00:01:12        7
    8: 2015-01-01 00:01:45        1
    
    split(mydata, as.numeric(mydata$MyTimes) < median(as.numeric(mydata$MyTimes)))
    $`FALSE`
                   MyTimes secondcol
    1: 2015-01-01 00:00:48         4
    2: 2015-01-01 00:00:48         5
    3: 2015-01-01 00:00:54         6
    4: 2015-01-01 00:01:12         7
    5: 2015-01-01 00:01:45         8
    
    $`TRUE`
                   MyTimes secondcol
    1: 2015-01-01 00:00:03         1
    2: 2015-01-01 00:00:04         2
    3: 2015-01-01 00:00:18         3
    
    split(mydata,as.numeric(mydata$MyTimes)
    不像@DatamineR的解决方案那样优雅,但使用游程编码的替代方案是

    library(data.table)
    
    mydata[, grp := rleid(MyTimes)]  ## put times into groups
    split(mydata, mydata$grp >= ceiling(max(mydata$grp)/2))
    
    $`FALSE`
                   MyTimes othercol grp
    1: 2015-01-01 00:00:03        1   1
    2: 2015-01-01 00:00:04        2   2
    3: 2015-01-01 00:00:18        3   3
    
    $`TRUE`
                   MyTimes othercol grp
    1: 2015-01-01 00:00:48        4   4
    2: 2015-01-01 00:00:48        5   4
    3: 2015-01-01 00:00:54        6   5
    4: 2015-01-01 00:01:12        7   6
    5: 2015-01-01 00:01:45        8   7
    

    您的示例数据生成警告。我正在研究一个
    data.table
    解决方案(
    library(data.table)setDT(mydata)[,grp:=rleid(MyTimes)]拆分(mydata,mydata$grp==天花板(max(mydata$grp)/2))
    ),它在逻辑上与您的类似(但您的更简单、更优雅:)@DatamineR我在.Call(“Crbindlist”,l,use.names,fill):“Crbindlist”未从当前名称空间(data.table)解析@user3022875我明天会看一看