Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 检查数据集中以前是否出现了第1列值以及第2列值是否不同_R - Fatal编程技术网

R 检查数据集中以前是否出现了第1列值以及第2列值是否不同

R 检查数据集中以前是否出现了第1列值以及第2列值是否不同,r,R,我的数据集是这样的 df <- data.frame(ID = c("m1","m2","m3","m4","m5","m6","m2","m3","m5","m6","m1","m4","m5"), Year = c(1,1,1,1,1,1,2,2,2,2,3,3,3)) df试试看 库(dplyr) dfsk=长度(唯一(df$Year))#数据中有多少年 q=唯一(df$年)#哪些是当前年份 func您可以使用ave:对于每个ID,计算当前年份和之

我的数据集是这样的

df <- data.frame(ID = c("m1","m2","m3","m4","m5","m6","m2","m3","m5","m6","m1","m4","m5"),
                 Year = c(1,1,1,1,1,1,2,2,2,2,3,3,3))
df试试看

库(dplyr)
dfs
k=长度(唯一(df$Year))#数据中有多少年
q=唯一(df$年)#哪些是当前年份

func您可以使用
ave
:对于每个
ID
,计算当前
年份
和之前的
年份
diff
)之间的差异。用前导零填充。检查结果是否为
1
,以创建逻辑向量:

df$check2 <- with(df, ave(Year, ID, FUN = function(x) c(0, diff(x))) == 1)
#    ID Year check check2
# 1  m1    1 FALSE  FALSE
# 2  m2    1 FALSE  FALSE
# 3  m3    1 FALSE  FALSE
# 4  m4    1 FALSE  FALSE
# 5  m5    1 FALSE  FALSE
# 6  m6    1 FALSE  FALSE
# 7  m2    2  TRUE   TRUE
# 8  m3    2  TRUE   TRUE
# 9  m5    2  TRUE   TRUE
# 10 m6    2  TRUE   TRUE
# 11 m1    3 FALSE  FALSE
# 12 m4    3 FALSE  FALSE
# 13 m5    3  TRUE   TRUE

按OP编辑以下注释。如果“同一年中有多个相同ID的条目”,则对删除重复行的数据执行计算(
unique
)。然后将结果与原始数据合并

df2 <- unique(df)
df2[ , Check2 := c(FALSE, diff(Year) == 1), by = ID]
df[df2, on = c("ID", "Year")] 

df2干杯,太棒了!只需要按年份对数据集进行排序beforehand@Gowzie是的,你需要
df它对你@Gowzie有帮助吗?不幸的是,得到了“替换有20行,数据有13行”的错误,还没有时间研究如何解决这个问题。但由于@experiator解决方案工作得很好,我目前正在使用他的。很好的替代方案,但当您在同一年中有多个相同ID的条目时,就会失败。绝对容易解决,而且比我原来的解决方案快得多@Gowzie感谢您的反馈。请记住,在你的问题中总是要构造一个足够复杂的例子。我们不是读心术的人……;)
k = length(unique(df$Year))        # how many years in the data
q = unique(df$Year)                # which are the years present

func <- function(x){  
  kk = df$ID[df$Year == q[x]]      # get the current year's ID which are present
  kk %in% df$ID[df$Year == q[x-1]] # compare that to the previous year's ID
}

x <- sum(df$Year==unique(df$Year)[1]) #to know how many FALSE to be added initially
df$check <- c(rep(FALSE, x),unlist(lapply(2:k, func)))
df$check2 <- with(df, ave(Year, ID, FUN = function(x) c(0, diff(x))) == 1)
#    ID Year check check2
# 1  m1    1 FALSE  FALSE
# 2  m2    1 FALSE  FALSE
# 3  m3    1 FALSE  FALSE
# 4  m4    1 FALSE  FALSE
# 5  m5    1 FALSE  FALSE
# 6  m6    1 FALSE  FALSE
# 7  m2    2  TRUE   TRUE
# 8  m3    2  TRUE   TRUE
# 9  m5    2  TRUE   TRUE
# 10 m6    2  TRUE   TRUE
# 11 m1    3 FALSE  FALSE
# 12 m4    3 FALSE  FALSE
# 13 m5    3  TRUE   TRUE
library(data.table)
setDT(df)[ , Check2 := c(FALSE, diff(Year) == 1), by = ID]
df2 <- unique(df)
df2[ , Check2 := c(FALSE, diff(Year) == 1), by = ID]
df[df2, on = c("ID", "Year")]