在R中：如何检查我是否有连续几年的数据（以便以后能够计算增长）？_R_For Loop_If Statement

在R中：如何检查我是否有连续几年的数据（以便以后能够计算增长）？

r for-loop if-statement

在R中：如何检查我是否有连续几年的数据（以便以后能够计算增长）？,r,for-loop,if-statement,R,For Loop,If Statement,我有下面的数据帧（示例）：我使用for循环尝试创建一个序列列，为每个新的数字序列开始一个新的数字。我是新手，所以我的定义可能有点错误。我的for循环如下所示： size1 <- c(1:3) s <- 0 for (val1 in size) { m <- max(sample[sample$companyID == val1, 4]) size2 <- c(1:m) for (val2 in size2){ row <- sample[wh

我有下面的数据帧（示例）：

我使用for循环尝试创建一个序列列，为每个新的数字序列开始一个新的数字。我是新手，所以我的定义可能有点错误。我的for循环如下所示：

size1 <- c(1:3)
s <- 0
for (val1 in size) {
  m <- max(sample[sample$companyID == val1, 4])
  size2 <- c(1:m)
  for (val2 in size2){ 
    row <- sample[which(sample$companyID == val1 & sample$yearID == val2)]
    m1 <- sample[sample$companyID == val1 & sample$yearID == val2, 2]
    m2 <- sample[sample$CompanyID == val1 & sample$yearID == (val2-1), 2]
    if(val2>1 && m1-m2 > 1) {
                  sample$sequence[row] s = s+1}
    else {s = s}
  }
  }

如果有人能帮忙，我将不胜感激

不清楚您是否想要确切的

期望结果

或

按公司ID检查您是否有连续年份

根据您的标题信息：

sample <- read.table(header = TRUE, text = "
companyID   year   yearID
    1       2010     1
    1       2011     2
    1       2012     3
    1       2013     4
    2       2010     1
    2       2011     2
    2       2016     3
    2       2017     4
    2       2018     5
    3       2010     1
    3       2011     2
    3       2014     3
    3       2017     4
    3       2018     5
")

library(data.table)
sample <- setDT(sample)
sample[ , diff_year := year - shift(year), by = companyID]    
sample <- setDF(sample)
sample
#>    companyID year yearID diff_year
#> 1          1 2010      1        NA
#> 2          1 2011      2         1
#> 3          1 2012      3         1
#> 4          1 2013      4         1
#> 5          2 2010      1        NA
#> 6          2 2011      2         1
#> 7          2 2016      3         5
#> 8          2 2017      4         1
#> 9          2 2018      5         1
#> 10         3 2010      1        NA
#> 11         3 2011      2         1
#> 12         3 2014      3         3
#> 13         3 2017      4         3
#> 14         3 2018      5         1

# Created on 2021-03-13 by the reprex package (v1.0.0.9002)

示例2 1 2011 2 1
#> 3          1 2012      3         1
#> 4          1 2013      4         1
#>5 2 2010 1 NA
#> 6          2 2011      2         1
#> 7          2 2016      3         5
#> 8          2 2017      4         1
#> 9          2 2018      5         1
#>10 3 2010 1 NA
#> 11         3 2011      2         1
#> 12         3 2014      3         3
#> 13         3 2017      4         3
#> 14         3 2018      5         1
#由reprex软件包（v1.0.0.9002）于2021年3月13日创建

有关

这是一个好问题

第一个

分组人

公司ID

使用

lag

计算

year

列中每个连续行的差值，以确定年份是否连续

group\u by

companyID，yearID）

mutate

helper列

sequence1

对组中每个开始的连续年份应用1

取消分组

并每次应用一个序列号1 发生在

sequence1

删除列

sequence1

和

deltalag1

库（tidyverse）
df1%
集团（公司ID）%>%
突变（deltaLag1=year-lag（year，1））%>%
分组依据（公司ID，年份ID）%>%
当（is.na（deltaLag1）| deltaLag1>1~1时，突变（sequence1=case_，
真~2））%>%
解组（）%>%
突变（序列=cumsum（序列1==1））%>%
选择（-deltaLag1，-sequence1）

资料

df非常感谢您！！这很有效！这也很有效！！非常感谢。
companyID   year   yearID   sequence
    1       2010     1          1
    1       2011     2          1
    1       2012     3          1
    1       2013     4          1
    2       2010     1          2
    2       2011     2          2
    2       2016     3          3
    2       2017     4          3
    2       2018     5          3
    3       2010     1          4
    3       2011     2          4
    3       2014     3          5
    3       2017     4          6
    3       2018     5          6

sample <- read.table(header = TRUE, text = "
companyID   year   yearID
    1       2010     1
    1       2011     2
    1       2012     3
    1       2013     4
    2       2010     1
    2       2011     2
    2       2016     3
    2       2017     4
    2       2018     5
    3       2010     1
    3       2011     2
    3       2014     3
    3       2017     4
    3       2018     5
")

library(data.table)
sample <- setDT(sample)
sample[ , diff_year := year - shift(year), by = companyID]    
sample <- setDF(sample)
sample
#>    companyID year yearID diff_year
#> 1          1 2010      1        NA
#> 2          1 2011      2         1
#> 3          1 2012      3         1
#> 4          1 2013      4         1
#> 5          2 2010      1        NA
#> 6          2 2011      2         1
#> 7          2 2016      3         5
#> 8          2 2017      4         1
#> 9          2 2018      5         1
#> 10         3 2010      1        NA
#> 11         3 2011      2         1
#> 12         3 2014      3         3
#> 13         3 2017      4         3
#> 14         3 2018      5         1

# Created on 2021-03-13 by the reprex package (v1.0.0.9002)

library(tidyverse)

df1 <- df %>% 
  group_by(companyID) %>% 
  mutate(deltaLag1 = year - lag(year, 1)) %>% 
  group_by(companyID, yearID) %>% 
  mutate(sequence1 = case_when(is.na(deltaLag1) | deltaLag1 > 1 ~ 1,
                               TRUE ~ 2)) %>% 
  ungroup() %>% 
  mutate(sequence = cumsum(sequence1==1)) %>% 
  select(-deltaLag1, -sequence1)

df <- tribble(
~companyID,   ~year,   ~yearID,
1, 2010, 1, 
1, 2011, 2, 
1, 2012, 3, 
1, 2013, 4, 
2, 2010, 1, 
2, 2011, 2, 
2, 2016, 3, 
2, 2017, 4, 
2, 2018, 5, 
3, 2010, 1, 
3, 2011, 2, 
3, 2014, 3, 
3, 2017, 4, 
3, 2018, 5)