Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 数据表中处理的复杂分组_R_Group By_Data.table - Fatal编程技术网

R 数据表中处理的复杂分组

R 数据表中处理的复杂分组,r,group-by,data.table,R,Group By,Data.table,抱歉描述不清楚,但我不认为一行可以解释我的要求 我有一个data.tabledt1,如下所示: id pg pd dt capp vt 1: 1111 hm <NA> 20-10-2020 21:07:54 NA 5 2: 1111 abc abc 20-10-2020 21:07:53 1234 5 3: 1111 hm <NA> 20-10-2020 16:07:56 NA 4 4: 11

抱歉描述不清楚,但我不认为一行可以解释我的要求

我有一个data.table
dt1
,如下所示:

      id  pg   pd                  dt capp vt
 1: 1111  hm <NA> 20-10-2020 21:07:54   NA  5
 2: 1111 abc  abc 20-10-2020 21:07:53 1234  5
 3: 1111  hm <NA> 20-10-2020 16:07:56   NA  4
 4: 1111 cde <NA> 20-10-2020 16:06:57   NA  4
 5: 1111 cde <NA> 20-10-2020 16:05:58   NA  4
 6: 1111 def  def 20-07-2020 12:07:59  345  3
 7: 1111 abc <NA> 20-06-2020 22:07:59   NA  2
 8: 1111 def <NA> 20-06-2020 22:07:58   NA  2
 9: 1111 abc <NA> 20-05-2020 21:07:59   NA  1
10: 1112  hm <NA> 20-10-2020 21:07:52   NA  4
11: 1112 cde  cde 20-10-2020 21:07:39  456  4
12: 1112  hm <NA> 20-10-2020 16:07:56   NA  3
13: 1112 abc <NA> 20-10-2020 16:06:57   NA  3
14: 1112 abc <NA> 20-07-2020 16:05:58   NA  2
15: 1112 def  abc 20-07-2020 16:04:59  234  2
16: 1112 cde <NA> 20-06-2020 22:07:59   NA  1
17: 1112 def <NA> 20-06-2020 21:07:59   NA  1
18: 1112 cde <NA> 20-05-2020 21:07:59   NA  0
dt1
定义如下:

structure(list(id = c(1111L, 1111L, 1111L, 1111L, 1111L, 1111L, 
1111L, 1111L, 1111L, 1112L, 1112L, 1112L, 1112L, 1112L, 1112L, 
1112L, 1112L, 1112L), pg = c("hm", "abc", "hm", "cde", "cde", 
"def", "abc", "def", "abc", "hm", "cde", "hm", "abc", "abc", 
"def", "cde", "def", "cde"), pd = c(NA, "abc", NA, NA, NA, "def", 
NA, NA, NA, NA, "cde", NA, NA, NA, "abc", NA, NA, NA), dt = c("20-10-2020 21:07:54", 
"20-10-2020 21:07:53", "20-10-2020 16:07:56", "20-10-2020 16:06:57", 
"20-10-2020 16:05:58", "20-07-2020 12:07:59", "20-06-2020 22:07:59", 
"20-06-2020 22:07:58", "20-05-2020 21:07:59", "20-10-2020 21:07:52", 
"20-10-2020 21:07:39", "20-10-2020 16:07:56", "20-10-2020 16:06:57", 
"20-07-2020 16:05:58", "20-07-2020 16:04:59", "20-06-2020 22:07:59", 
"20-06-2020 21:07:59", "20-05-2020 21:07:59"), capp = c(NA, 1234L, 
NA, NA, NA, 345L, NA, NA, NA, NA, 456L, NA, NA, NA, 234L, NA, 
NA, NA), vt = c(5L, 5L, 4L, 4L, 4L, 3L, 2L, 2L, 1L, 4L, 4L, 3L, 
3L, 2L, 2L, 1L, 1L, 0L)), .Names = c("id", "pg", "pd", "dt", 
"capp", "vt"), row.names = c(NA, -18L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000002650788>)
结构(列表id=c(1111L、1111L、1111L、1111L、1111L、1111L、1111L、, 1111L、1111L、1111L、1112L、1112L、1112L、1112L、1112L、1112L、1112L、1112L、, 1112L,1112L,1112L),pg=c(“hm”,“abc”,“hm”,“cde”,“cde”, “定义”、“abc”、“定义”、“abc”、“hm”、“cde”、“hm”、“abc”、“abc”, “定义”、“cde”、“定义”、“cde”),pd=c(NA,“abc”,NA,NA,NA,“定义”, NA,NA,NA,NA,“cde”,NA,NA,NA,“abc”,NA,NA,NA),dt=c(“20-10-2020 21:07:54”, "20-10-2020 21:07:53", "20-10-2020 16:07:56", "20-10-2020 16:06:57", "20-10-2020 16:05:58", "20-07-2020 12:07:59", "20-06-2020 22:07:59", "20-06-2020 22:07:58", "20-05-2020 21:07:59", "20-10-2020 21:07:52", "20-10-2020 21:07:39", "20-10-2020 16:07:56", "20-10-2020 16:06:57", "20-07-2020 16:05:58", "20-07-2020 16:04:59", "20-06-2020 22:07:59", “20-06-2020 21:07:59”、“20-05-2020 21:07:59”),capp=c(NA,1234L, 不,不,不,345L,不,不,不,不,456L,不,不,不,不,234L,不, NA,NA),vt=c(5L,5L,4L,4L,3L,2L,2L,1L,4L,4L,3L, 3L,2L,2L,1L,1L,0L),名称=c(“id”,“pg”,“pd”,“dt”, “capp”,“vt”,row.names=c(NA,-18L),class=c(“data.table”, “data.frame”),.internal.selfref=) 这就是你需要的吗

dt1[, 
  prev := with(.SD, vapply(
    seq_along(vt), 
    function(i) {tmp <- vt[vt < vt[[i]] & pg == pd[[i]] & !is.na(capp[[i]])]; if (length(tmp) < 1L) NA_real_ else max(tmp)}, 
    numeric(1L)
  )), 
  by = id
]
dt1[,,
prev:=带(.SD,vapply)(
顺时针(vt),

函数(i){tmp这里是另一个选项,对非空capp的每一行使用非等联接,然后通过引用进行更新:

dt1[!is.na(capp), prev := 
    dt1[.SD, on=.(id, pg=pd, vt<vt), max(x.vt), by=.EACHI]$V1
]
dt1[!is.na(capp),prev:=

dt1[.SD,on=。(ID,PG= PD,VTAS,我可以说,你从来没有考虑过代码< >代码> CAP >代码> No>代码,所有的值都是从<代码> VT < /代码>列中得到的。<代码>第二行> <代码> >代码> > <代码> CAPP > <代码> VT <代码> 2 < /代码>
id=1111
?@ekoam:我知道你是从哪里来的。vt小于对应于非空capp的vt值,实际上,我正在寻找
vt
(其中
capp
可以为空)的值,但小于
capp
非空行中
vt
的值(ergo对应于非空capp)这很有效,但速度非常慢。谢谢你的帮助。你给了我一些食物来咀嚼。谢谢你的帮助。奇怪的是,如果我需要添加另一个条件,比如
pg!=“cc”
,它会进入
On()
dt1[!is.na(capp),prev:=dt1[.SD,On=(id,pg=pd,vt
      id  pg   pd                  dt capp vt prev
 1: 1111  hm <NA> 20-10-2020 21:07:54   NA  5   NA
 2: 1111 abc  abc 20-10-2020 21:07:53 1234  5    2
 3: 1111  hm <NA> 20-10-2020 16:07:56   NA  4   NA
 4: 1111 cde <NA> 20-10-2020 16:06:57   NA  4   NA
 5: 1111 cde <NA> 20-10-2020 16:05:58   NA  4   NA
 6: 1111 def  def 20-07-2020 12:07:59  345  3    2
 7: 1111 abc <NA> 20-06-2020 22:07:59   NA  2   NA
 8: 1111 def <NA> 20-06-2020 22:07:58   NA  2   NA
 9: 1111 abc <NA> 20-05-2020 21:07:59   NA  1   NA
10: 1112  hm <NA> 20-10-2020 21:07:52   NA  4   NA
11: 1112 cde  cde 20-10-2020 21:07:39  456  4    1
12: 1112  hm <NA> 20-10-2020 16:07:56   NA  3   NA
13: 1112 abc <NA> 20-10-2020 16:06:57   NA  3   NA
14: 1112 abc <NA> 20-07-2020 16:05:58   NA  2   NA
15: 1112 def  abc 20-07-2020 16:04:59  234  2   NA
16: 1112 cde <NA> 20-06-2020 22:07:59   NA  1   NA
17: 1112 def <NA> 20-06-2020 21:07:59   NA  1   NA
18: 1112 cde <NA> 20-05-2020 21:07:59   NA  0   NA
dt1[!is.na(capp), prev := 
    dt1[.SD, on=.(id, pg=pd, vt<vt), max(x.vt), by=.EACHI]$V1
]