R 最近年份之间的百分比变化_R_Data.table_Data Manipulation

R 最近年份之间的百分比变化

R 最近年份之间的百分比变化,r,data.table,data-manipulation,R,Data.table,Data Manipulation,我在创建一个新变量增长时遇到了一个问题，该变量相当于在以“2”和“7”结尾的最近年份之间人口的百分比变化 # dt ID Population year 1 50 1995 1 60 1996 1 70 1997 1 80 1998 1 90 1999 1

我在创建一个新变量

增长

时遇到了一个问题，该变量相当于在以“2”和“7”结尾的最近年份之间

人口

的百分比变化

# dt
ID       Population      year
1                50      1995
1                60      1996
1                70      1997
1                80      1998
1                90      1999
1               100      2000
1               105      2001
1               110      2002
1               120      2003
1               130      2004
1               140      2005
1               150      2006
1               200      2007
1               300      2008

dt <- data.table::fread("ID       Population      year
1                50      1995
  1                60      1996
  1                70      1997
  1                80      1998
  1                90      1999
  1               100      2000
  1               105      2001
  1               110      2002
  1               120      2003
  1               130      2004
  1               140      2005
  1               150      2006
  1               200      2007
  1               300      2008", header = T)

#dt
ID人口年
1                50      1995
1                60      1996
1                70      1997
1                80      1998
1                90      1999
1               100      2000
1               105      2001
1               110      2002
1               120      2003
1               130      2004
1               140      2005
1               150      2006
1               200      2007
1               300      2008
dt不是一个data.table
解决方案，但下面是您如何在tidyverse
中实现它的方法，它可以为您提供一些想法。基本上，使用整数除法计算每行要比较的年份，然后将表本身连接起来，这样我们就可以在每行中获得比较值。那么，使用您的公式计算增长就很简单了
库（tidyverse）
dt%
变异（比较年=5*年%/%5-3）%>%
左连接（dt，by=c（“ID”，“比较年份”=“年份”））%>%
突变（增长=（Population.x-Population.y）/Population.y）
#>#A tibble:14 x 6
#>ID人口.x年与u年人口.y增长
#>                               
#>1 150 1995 1992年不适用
#>2 1 60 1996 1992年不适用
#>3 1 70 1997 1992不适用
#>418019981992NA
#>5190199921992不适用
#>  6     1          100  2000         1997           70  0.429
#>  7     1          105  2001         1997           70  0.5  
#>  8     1          110  2002         1997           70  0.571
#>  9     1          120  2003         1997           70  0.714
#> 10     1          130  2004         1997           70  0.857
#> 11     1          140  2005         2002          110  0.273
#> 12     1          150  2006         2002          110  0.364
#> 13     1          200  2007         2002          110  0.818
#> 14     1          300  2008         2002          110  1.73

由（v0.2.0）于2018-09-19创建。
不是一个数据表
解决方案，但以下是您可以在tidyverse
中实现它的方法，它可以给您一些想法。基本上，使用整数除法计算每行要比较的年份，然后将表本身连接起来，这样我们就可以在每行中获得比较值。那么，使用您的公式计算增长就很简单了
库（tidyverse）
dt%
变异（比较年=5*年%/%5-3）%>%
左连接（dt，by=c（“ID”，“比较年份”=“年份”））%>%
突变（增长=（Population.x-Population.y）/Population.y）
#>#A tibble:14 x 6
#>ID人口.x年与u年人口.y增长
#>                               
#>1 150 1995 1992年不适用
#>2 1 60 1996 1992年不适用
#>3 1 70 1997 1992不适用
#>418019981992NA
#>5190199921992不适用
#>  6     1          100  2000         1997           70  0.429
#>  7     1          105  2001         1997           70  0.5  
#>  8     1          110  2002         1997           70  0.571
#>  9     1          120  2003         1997           70  0.714
#> 10     1          130  2004         1997           70  0.857
#> 11     1          140  2005         2002          110  0.273
#> 12     1          150  2006         2002          110  0.364
#> 13     1          200  2007         2002          110  0.818
#> 14     1          300  2008         2002          110  1.73

由（v0.2.0）于2018-09-19创建。
类似于@calum_you，但按照OP中的要求，使用最接近的5年增长率
样本数据
dt <- data.table::fread("ID       Population      year
1                50      1995
  1                60      1996
  1                70      1997
  1                80      1998
  1                90      1999
  1               100      2000
  1               105      2001
  1               110      2002
  1               120      2003
  1               130      2004
  1               140      2005
  1               150      2006
  1               200      2007
  1               300      2008", header = T) %>%
  as_data_frame()  

输出
   ID Population year join_yr growth_5yr
1   1         50 1995    1997         NA
2   1         60 1996    1997         NA
3   1         70 1997    2002  0.5714286
4   1         80 1998    2002  0.5714286
5   1         90 1999    2002  0.5714286
6   1        100 2000    2002  0.5714286
7   1        105 2001    2002  0.5714286
8   1        110 2002    2007  0.8181818
9   1        120 2003    2007  0.8181818
10  1        130 2004    2007  0.8181818
11  1        140 2005    2007  0.8181818
12  1        150 2006    2007  0.8181818
13  1        200 2007    2012         NA
14  1        300 2008    2012         NA

与@calum_you类似，但使用最接近的5年增长率，如OP
样本数据
dt <- data.table::fread("ID       Population      year
1                50      1995
  1                60      1996
  1                70      1997
  1                80      1998
  1                90      1999
  1               100      2000
  1               105      2001
  1               110      2002
  1               120      2003
  1               130      2004
  1               140      2005
  1               150      2006
  1               200      2007
  1               300      2008", header = T) %>%
  as_data_frame()  

输出
   ID Population year join_yr growth_5yr
1   1         50 1995    1997         NA
2   1         60 1996    1997         NA
3   1         70 1997    2002  0.5714286
4   1         80 1998    2002  0.5714286
5   1         90 1999    2002  0.5714286
6   1        100 2000    2002  0.5714286
7   1        105 2001    2002  0.5714286
8   1        110 2002    2007  0.8181818
9   1        120 2003    2007  0.8181818
10  1        130 2004    2007  0.8181818
11  1        140 2005    2007  0.8181818
12  1        150 2006    2007  0.8181818
13  1        200 2007    2012         NA
14  1        300 2008    2012         NA

以下是一种可能的数据。表方法：
#calculate the 5-yearly percentage changes first by 
#i) first creating all combinations of ID and 5-yearly years
#2) then join with the original dataset 
#3) then leading the Population column and calculating Growth
pctChange <- dt[CJ(ID=ID, year=seq(1967, 2022, 5), unique=TRUE), 
    .(ID, year, Growth=(shift(Population, type="lead") - Population) / Population), 
    on=.(ID, year)]    

#then perform a rolling join (`roll=TRUE`; see ?data.table) and 
#then update the original dt with Growth by reference (i.e. `:=`)
dt[, Growth := pctChange[dt, Growth, on=.(ID, year), roll=TRUE]]
dt

请注意：滚动联接似乎不适用于更新联接
dt[pctChange, Growth := Growth, on=.(ID, year), roll=TRUE]

以下是一种可能的数据。表方法：
#calculate the 5-yearly percentage changes first by 
#i) first creating all combinations of ID and 5-yearly years
#2) then join with the original dataset 
#3) then leading the Population column and calculating Growth
pctChange <- dt[CJ(ID=ID, year=seq(1967, 2022, 5), unique=TRUE), 
    .(ID, year, Growth=(shift(Population, type="lead") - Population) / Population), 
    on=.(ID, year)]    

#then perform a rolling join (`roll=TRUE`; see ?data.table) and 
#then update the original dt with Growth by reference (i.e. `:=`)
dt[, Growth := pctChange[dt, Growth, on=.(ID, year), roll=TRUE]]
dt

请注意：滚动联接似乎不适用于更新联接
dt[pctChange, Growth := Growth, on=.(ID, year), roll=TRUE]

在您的示例中，1992年的Pop不存在，那么您将如何计算1996年的增长？2012年的Pop也不存在在你的例子中，1992年的Pop不存在，那么你如何计算1996年的增长？2012年流行音乐也不存在。如果我想通过ID
进行操作，该怎么办？将ID
添加到on=（ID，年份）
？不一定，谢谢。上面的函数CJ
是什么？它是来自data.table
的交叉连接函数。您可能想签出？data.table:：CJ
太好了。如果我想通过ID
进行操作，该怎么办？将ID
添加到on=（ID，年份）
？不一定，谢谢。上面的函数CJ
是什么？它是来自data.table
的交叉连接函数。您可能想签出？data.table:：CJ