R:使用不存在的变量进行子集设置不会产生错误

R:使用不存在的变量进行子集设置不会产生错误,r,if-statement,brackets,R,If Statement,Brackets,在创建lead变量时,我无意中遗漏了用于分组数据的lead变量。我使用括号插入NA,没有报告错误。为了检查我的理智,我对ifelse做了同样的操作,并创建了一条错误消息。我担心的是,如果没有仔细的回顾和一些运气,我可能永远不会知道我的错误 其他人如何以不同的编码方式在未来降低这种可能性(以最小的时间成本)?另外,还有其他类似的问题需要我注意吗?谢谢,下面是可复制的示例 dt <- data.frame( group_name = c("D44", "D44","D44", "D45", "

在创建lead变量时,我无意中遗漏了用于分组数据的lead变量。我使用括号插入NA,没有报告错误。为了检查我的理智,我对ifelse做了同样的操作,并创建了一条错误消息。我担心的是,如果没有仔细的回顾和一些运气,我可能永远不会知道我的错误

其他人如何以不同的编码方式在未来降低这种可能性(以最小的时间成本)?另外,还有其他类似的问题需要我注意吗?谢谢,下面是可复制的示例

dt <- data.frame(
group_name = c("D44", "D44","D44", "D45", "D45", "D47", "D47", "D47", "D47", "D48"),
order_number = sample(1:10))

dt$group_name <- as.character(dt$group_name) # so not a factor

dt <- dt[order(dt$group_name, dt$order_number),] # sort data

dt$lead1order_number <- c(dt$order_number[-1], NA)

# COMMENT OUT NEXT LINE AND RUN, no error with brackets, but one with ifelse
dt$lead1group_name <- c(dt$group_name[-1], NA) 

# done two different ways below
    # if group_name doesn't match lead1group_name, then lead1order_number NA
dt$lead1order_number[dt$group_name != dt$lead1group_name] <- NA  

dt$lead1order_number <- ifelse(dt$group_name != dt$lead1group_name, NA, dt$lead1order_number)

dt你的问题很深。括号或子集的问题是R的关键特征之一。很难全面回答您的问题。我只是提出一个可能的最简单的解决方案:

# `stringsAsFactors = FALSE` ensures that strings will not be transformed to factors
dt <- data.frame(group_name = c("D44", "D44","D44", "D45", 
    "D45", "D47", "D47", "D47", "D47", "D48"),
    order_number = sample(1:10), stringsAsFactors = FALSE)
dt <- dt[order(dt$group_name, dt$order_number),] # sort data
dt$lead1order_number <- c(dt$order_number[-1], NA)
# the example was slightly modified to demonstrate subsetting with NA
dt$lead1group_name <- c(dt$group_name[-c(1:2)], NA, "D")
使用
[
保留
子集将导致错误:

print(dt[ ,"lead2group_name", drop = FALSE])
[.data.frame(dt,“lead2group_name”)中出错:未定义列 精选

我将使用此问题来确保data.frame中存在请求的列:

ind_of_non_match <- which(dt[ ,"group_name", drop = FALSE] != dt[ ,"lead1group_name", drop = FALSE])
ind_of_na <- which(is.na(dt[ , "lead1group_name", drop = FALSE]))
dt$lead1order_number[c(ind_of_non_match, ind_of_na)] <- NA

ind\u of\u non\u match你到底在问什么?不确定你指的是什么括号。有或没有分组变量,lead/lag都有意义。这一切都取决于你在做什么。使用
dplyr
会容易得多。然后你可以使用
group\u by()
命令和
mutate()
lead()
函数。@MrFlick我使用lags来查看组名中下一个订单号的订单号。当该订单号与下一个组名重叠时,尽管我需要将其设置为NA。重叠由组名!=lead1group\u名称标识。当我在括号内设置该条件时,[dt$group\U name!=dt$lead1group\U name],即使lead1group\U name不存在,也不会给我错误。我同意dplyr是一个选项,但重点是在使用括号与ifelse(或dplyr)比较时显示错误。
ind_of_non_match <- which(dt[ ,"group_name", drop = FALSE] != dt[ ,"lead1group_name", drop = FALSE])
ind_of_na <- which(is.na(dt[ , "lead1group_name", drop = FALSE]))
dt$lead1order_number[c(ind_of_non_match, ind_of_na)] <- NA
dt$lead1order_number[(dt[ ,"group_name", drop = FALSE] != dt[ ,"lead1group_name", drop = FALSE])] <- NA