R 如何按组更新特定行上的常量值?
比如说,我们有以下几点R 如何按组更新特定行上的常量值?,r,data.table,R,Data.table,比如说,我们有以下几点 library(data.table); library(zoo) dt <- data.table(grp = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), period = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2014-05-01'), by = 'month'), x=c(1:15), y=c(11:25)) dt[, period:=as.yearmon(period
library(data.table); library(zoo)
dt <- data.table(grp = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), period = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2014-05-01'), by = 'month'), x=c(1:15), y=c(11:25))
dt[, period:=as.yearmon(period, '%Y-%m-%d')]
我想使用与2014年3月相关的值更新x列和y列。我预期的回报如下:
grp period x y
1: 1 Jan 2014 1 11
2: 1 Feb 2014 2 12
3: 1 Mar 2014 3 13
4: 1 Apr 2014 3 13
5: 1 May 2014 3 13
6: 2 Jan 2014 6 16
7: 2 Feb 2014 7 17
8: 2 Mar 2014 8 18
9: 2 Apr 2014 8 18
10: 2 May 2014 8 18
11: 3 Jan 2014 11 21
12: 3 Feb 2014 12 22
13: 3 Mar 2014 13 23
14: 3 Apr 2014 13 23
15: 3 May 2014 13 23
我尝试了以下代码,但它只使用第3行中的值
请您提供建议好吗?您可以在2014年3月之后用NA替换所有x和y值,然后使用NA.locf:
您可以将2014年3月之后的所有x和y值替换为NA,然后使用NA.locf:
dplyr的一个选项。过滤大于等于2014年3月期间的数据,并将2014年3月期间的x和y值分配给按grp分组的所有行
dplyr的一个选项。过滤大于等于2014年3月期间的数据,并将2014年3月期间的x和y值分配给按grp分组的所有行
再看一遍,我认为这是一种非常干净的排序方式:
cols = c("x", "y")
dt[period >= "Mar 2014", (cols) := .SD[1L], by=grp, .SDcols = cols]
另一种方法是使用滚动连接:
dt[period >= "Mar 2014", c("x", "y") :=
.SD[period == "Mar 2014"][.SD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
]
第二个选项的工作原理
以下所有内容都包含在主文档中,可通过键入?data.table访问
DT[i,cols:=e]将覆盖i选择的行中的cols
更仔细地观察e,我们看到.SD,它只在DT[i,…]内部工作。我们可以从DT[i,…]中取出来,用DT[i]代替.SD。在此基础上,我们可以简化e以了解其工作原理:
> mySD = DT[period >= "Mar 2014"]
> mySD
grp period x y
1: 1 Mar 2014 3 13
2: 1 Apr 2014 4 14
3: 1 May 2014 5 15
4: 2 Mar 2014 8 18
5: 2 Apr 2014 9 19
6: 2 May 2014 10 20
7: 3 Mar 2014 13 23
8: 3 Apr 2014 14 24
9: 3 May 2014 15 25
> mySD[period == "Mar 2014"]
grp period x y
1: 1 Mar 2014 3 13
2: 2 Mar 2014 8 18
3: 3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
grp period x y i.x i.y
1: 1 Mar 2014 3 13 3 13
2: 1 Apr 2014 NA NA 4 14
3: 1 May 2014 NA NA 5 15
4: 2 Mar 2014 8 18 8 18
5: 2 Apr 2014 NA NA 9 19
6: 2 May 2014 NA NA 10 20
7: 3 Mar 2014 13 23 13 23
8: 3 Apr 2014 NA NA 14 24
9: 3 May 2014 NA NA 15 25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
grp period x y i.x i.y
1: 1 Mar 2014 3 13 3 13
2: 1 Apr 2014 3 13 4 14
3: 1 May 2014 3 13 5 15
4: 2 Mar 2014 8 18 8 18
5: 2 Apr 2014 8 18 9 19
6: 2 May 2014 8 18 10 20
7: 3 Mar 2014 13 23 13 23
8: 3 Apr 2014 13 23 14 24
9: 3 May 2014 13 23 15 25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
x.x x.y
1: 3 13
2: 3 13
3: 3 13
4: 8 18
5: 8 18
6: 8 18
7: 13 23
8: 13 23
9: 13 23
再看一遍,我认为这是一种非常干净的排序方式:
cols = c("x", "y")
dt[period >= "Mar 2014", (cols) := .SD[1L], by=grp, .SDcols = cols]
另一种方法是使用滚动连接:
dt[period >= "Mar 2014", c("x", "y") :=
.SD[period == "Mar 2014"][.SD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
]
第二个选项的工作原理
以下所有内容都包含在主文档中,可通过键入?data.table访问
DT[i,cols:=e]将覆盖i选择的行中的cols
更仔细地观察e,我们看到.SD,它只在DT[i,…]内部工作。我们可以从DT[i,…]中取出来,用DT[i]代替.SD。在此基础上,我们可以简化e以了解其工作原理:
> mySD = DT[period >= "Mar 2014"]
> mySD
grp period x y
1: 1 Mar 2014 3 13
2: 1 Apr 2014 4 14
3: 1 May 2014 5 15
4: 2 Mar 2014 8 18
5: 2 Apr 2014 9 19
6: 2 May 2014 10 20
7: 3 Mar 2014 13 23
8: 3 Apr 2014 14 24
9: 3 May 2014 15 25
> mySD[period == "Mar 2014"]
grp period x y
1: 1 Mar 2014 3 13
2: 2 Mar 2014 8 18
3: 3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
grp period x y i.x i.y
1: 1 Mar 2014 3 13 3 13
2: 1 Apr 2014 NA NA 4 14
3: 1 May 2014 NA NA 5 15
4: 2 Mar 2014 8 18 8 18
5: 2 Apr 2014 NA NA 9 19
6: 2 May 2014 NA NA 10 20
7: 3 Mar 2014 13 23 13 23
8: 3 Apr 2014 NA NA 14 24
9: 3 May 2014 NA NA 15 25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
grp period x y i.x i.y
1: 1 Mar 2014 3 13 3 13
2: 1 Apr 2014 3 13 4 14
3: 1 May 2014 3 13 5 15
4: 2 Mar 2014 8 18 8 18
5: 2 Apr 2014 8 18 9 19
6: 2 May 2014 8 18 10 20
7: 3 Mar 2014 13 23 13 23
8: 3 Apr 2014 13 23 14 24
9: 3 May 2014 13 23 15 25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
x.x x.y
1: 3 13
2: 3 13
3: 3 13
4: 8 18
5: 8 18
6: 8 18
7: 13 23
8: 13 23
9: 13 23
如果对dt进行排序,则可能还会重复dt[dt[,tail.I,-2,by=grp]$V1,`:`x=x[1],y=y[1],by=grp],如果对dt进行排序,则可能还会重复dt[dt[,tail.I,-2,by=grp]$V1,`:`x=x[1],y=y[1],by=grp],如果对dt进行排序,或dt[period>=2014年3月,`:`x=x=x[1],y=y=y=grp]?
> mySD = DT[period >= "Mar 2014"]
> mySD
grp period x y
1: 1 Mar 2014 3 13
2: 1 Apr 2014 4 14
3: 1 May 2014 5 15
4: 2 Mar 2014 8 18
5: 2 Apr 2014 9 19
6: 2 May 2014 10 20
7: 3 Mar 2014 13 23
8: 3 Apr 2014 14 24
9: 3 May 2014 15 25
> mySD[period == "Mar 2014"]
grp period x y
1: 1 Mar 2014 3 13
2: 2 Mar 2014 8 18
3: 3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
grp period x y i.x i.y
1: 1 Mar 2014 3 13 3 13
2: 1 Apr 2014 NA NA 4 14
3: 1 May 2014 NA NA 5 15
4: 2 Mar 2014 8 18 8 18
5: 2 Apr 2014 NA NA 9 19
6: 2 May 2014 NA NA 10 20
7: 3 Mar 2014 13 23 13 23
8: 3 Apr 2014 NA NA 14 24
9: 3 May 2014 NA NA 15 25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
grp period x y i.x i.y
1: 1 Mar 2014 3 13 3 13
2: 1 Apr 2014 3 13 4 14
3: 1 May 2014 3 13 5 15
4: 2 Mar 2014 8 18 8 18
5: 2 Apr 2014 8 18 9 19
6: 2 May 2014 8 18 10 20
7: 3 Mar 2014 13 23 13 23
8: 3 Apr 2014 13 23 14 24
9: 3 May 2014 13 23 15 25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
x.x x.y
1: 3 13
2: 3 13
3: 3 13
4: 8 18
5: 8 18
6: 8 18
7: 13 23
8: 13 23
9: 13 23