R 如何按组更新特定行上的常量值?

R 如何按组更新特定行上的常量值?,r,data.table,R,Data.table,比如说,我们有以下几点 library(data.table); library(zoo) dt <- data.table(grp = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), period = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2014-05-01'), by = 'month'), x=c(1:15), y=c(11:25)) dt[, period:=as.yearmon(period

比如说,我们有以下几点

library(data.table); library(zoo)
dt <- data.table(grp = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), period = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2014-05-01'), by = 'month'), x=c(1:15), y=c(11:25))
dt[, period:=as.yearmon(period, '%Y-%m-%d')]
我想使用与2014年3月相关的值更新x列和y列。我预期的回报如下:

    grp   period  x  y
 1:   1 Jan 2014  1 11
 2:   1 Feb 2014  2 12
 3:   1 Mar 2014  3 13
 4:   1 Apr 2014  3 13
 5:   1 May 2014  3 13
 6:   2 Jan 2014  6 16
 7:   2 Feb 2014  7 17
 8:   2 Mar 2014  8 18
 9:   2 Apr 2014  8 18
10:   2 May 2014  8 18
11:   3 Jan 2014 11 21
12:   3 Feb 2014 12 22
13:   3 Mar 2014 13 23
14:   3 Apr 2014 13 23
15:   3 May 2014 13 23
我尝试了以下代码,但它只使用第3行中的值

请您提供建议好吗?

您可以在2014年3月之后用NA替换所有x和y值,然后使用NA.locf:

您可以将2014年3月之后的所有x和y值替换为NA,然后使用NA.locf:

dplyr的一个选项。过滤大于等于2014年3月期间的数据,并将2014年3月期间的x和y值分配给按grp分组的所有行

dplyr的一个选项。过滤大于等于2014年3月期间的数据,并将2014年3月期间的x和y值分配给按grp分组的所有行


再看一遍,我认为这是一种非常干净的排序方式:

cols = c("x", "y")
dt[period >= "Mar 2014", (cols) := .SD[1L], by=grp, .SDcols = cols]
另一种方法是使用滚动连接:

dt[period >= "Mar 2014", c("x", "y") := 
  .SD[period == "Mar 2014"][.SD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
]
第二个选项的工作原理

以下所有内容都包含在主文档中,可通过键入?data.table访问

DT[i,cols:=e]将覆盖i选择的行中的cols

更仔细地观察e,我们看到.SD,它只在DT[i,…]内部工作。我们可以从DT[i,…]中取出来,用DT[i]代替.SD。在此基础上,我们可以简化e以了解其工作原理:

> mySD = DT[period >= "Mar 2014"]
> mySD
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   1 Apr 2014  4 14
3:   1 May 2014  5 15
4:   2 Mar 2014  8 18
5:   2 Apr 2014  9 19
6:   2 May 2014 10 20
7:   3 Mar 2014 13 23
8:   3 Apr 2014 14 24
9:   3 May 2014 15 25
> mySD[period == "Mar 2014"]
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   2 Mar 2014  8 18
3:   3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014 NA NA   4  14
3:   1 May 2014 NA NA   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014 NA NA   9  19
6:   2 May 2014 NA NA  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 NA NA  14  24
9:   3 May 2014 NA NA  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014  3 13   4  14
3:   1 May 2014  3 13   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014  8 18   9  19
6:   2 May 2014  8 18  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 13 23  14  24
9:   3 May 2014 13 23  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
   x.x x.y
1:   3  13
2:   3  13
3:   3  13
4:   8  18
5:   8  18
6:   8  18
7:  13  23
8:  13  23
9:  13  23

再看一遍,我认为这是一种非常干净的排序方式:

cols = c("x", "y")
dt[period >= "Mar 2014", (cols) := .SD[1L], by=grp, .SDcols = cols]
另一种方法是使用滚动连接:

dt[period >= "Mar 2014", c("x", "y") := 
  .SD[period == "Mar 2014"][.SD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
]
第二个选项的工作原理

以下所有内容都包含在主文档中,可通过键入?data.table访问

DT[i,cols:=e]将覆盖i选择的行中的cols

更仔细地观察e,我们看到.SD,它只在DT[i,…]内部工作。我们可以从DT[i,…]中取出来,用DT[i]代替.SD。在此基础上,我们可以简化e以了解其工作原理:

> mySD = DT[period >= "Mar 2014"]
> mySD
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   1 Apr 2014  4 14
3:   1 May 2014  5 15
4:   2 Mar 2014  8 18
5:   2 Apr 2014  9 19
6:   2 May 2014 10 20
7:   3 Mar 2014 13 23
8:   3 Apr 2014 14 24
9:   3 May 2014 15 25
> mySD[period == "Mar 2014"]
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   2 Mar 2014  8 18
3:   3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014 NA NA   4  14
3:   1 May 2014 NA NA   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014 NA NA   9  19
6:   2 May 2014 NA NA  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 NA NA  14  24
9:   3 May 2014 NA NA  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014  3 13   4  14
3:   1 May 2014  3 13   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014  8 18   9  19
6:   2 May 2014  8 18  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 13 23  14  24
9:   3 May 2014 13 23  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
   x.x x.y
1:   3  13
2:   3  13
3:   3  13
4:   8  18
5:   8  18
6:   8  18
7:  13  23
8:  13  23
9:  13  23
如果对dt进行排序,则可能还会重复dt[dt[,tail.I,-2,by=grp]$V1,`:`x=x[1],y=y[1],by=grp],如果对dt进行排序,则可能还会重复dt[dt[,tail.I,-2,by=grp]$V1,`:`x=x[1],y=y[1],by=grp],如果对dt进行排序,或dt[period>=2014年3月,`:`x=x=x[1],y=y=y=grp]?
> mySD = DT[period >= "Mar 2014"]
> mySD
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   1 Apr 2014  4 14
3:   1 May 2014  5 15
4:   2 Mar 2014  8 18
5:   2 Apr 2014  9 19
6:   2 May 2014 10 20
7:   3 Mar 2014 13 23
8:   3 Apr 2014 14 24
9:   3 May 2014 15 25
> mySD[period == "Mar 2014"]
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   2 Mar 2014  8 18
3:   3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014 NA NA   4  14
3:   1 May 2014 NA NA   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014 NA NA   9  19
6:   2 May 2014 NA NA  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 NA NA  14  24
9:   3 May 2014 NA NA  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014  3 13   4  14
3:   1 May 2014  3 13   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014  8 18   9  19
6:   2 May 2014  8 18  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 13 23  14  24
9:   3 May 2014 13 23  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
   x.x x.y
1:   3  13
2:   3  13
3:   3  13
4:   8  18
5:   8  18
6:   8  18
7:  13  23
8:  13  23
9:  13  23