R 如何将“==”行为扩展到包含NAs的向量？_R

R 如何将“==”行为扩展到包含NAs的向量？

R 如何将“==”行为扩展到包含NAs的向量？,r,R,我在寻找其他r-help或StackOverflow讨论这个特定问题时完全失败了。对不起，如果是在某个明显的地方。我相信我只是在寻找最简单的方法，让R's==符号永远不会返回NAs # Example # # Say I have two vectors a <- c( 1 , 2 , 3 ) b <- c( 1 , 2 , 4 ) # And want to test if each element in the first # is identical to each elem

我在寻找其他r-help或StackOverflow讨论这个特定问题时完全失败了。对不起，如果是在某个明显的地方。我相信我只是在寻找最简单的方法，让R's==符号永远不会返回NAs

# Example #

# Say I have two vectors
a <- c( 1 , 2 , 3 )
b <- c( 1 , 2 , 4 )
# And want to test if each element in the first
# is identical to each element in the second:
a == b
# It does what I want perfectly:
# TRUE TRUE FALSE

# But if either vector contains a missing,
# the `==` operator returns an incorrect result:
a <- c( 1 , NA , 3 ) 
b <- c( 1 , NA , 4 )
# Here I'd want   TRUE TRUE FALSE
a == b
# But I get TRUE NA FALSE

a <- c( 1 , NA , 3 ) 
b <- c( 1 , 2 , 4 )
# Here I'd want   TRUE FALSE FALSE
a == b
# But I get TRUE NA FALSE again.

但是

mapply

对我来说似乎很严厉

是否有更直观的解决方案？

另一个选项，但它是否优于

mapply（“%in%”，a，b）

：

根据@AnthonyDamico的建议，创建“mutt”运营商：

"%==%" <- function(a, b) (!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))

“%==”您可以试试
replace(a, is.na(a), Inf)==replace(b, is.na(b), Inf)

或者@docendo discimus建议的更快的变化
replace(a, which(is.na(a)), Inf)==replace(b, which(is.na(b)), Inf)

根据不同的场景
一,
a使用idential（）
包装在mapply（）

结果相同，但可以更好地处理舍入问题。
比如
a假设我们没有大量的NA
，建议的矢量化解决方案浪费了一些资源来比较已经由a==b
确定的值
我们通常可以假设NAs
很少，因此值得先计算a==b
，然后分别处理NAs
，尽管还有其他步骤和临时变量：
`%==%` <- function(a,b){
  x <- a==b
  na_x <- which(is.na(x))
  x[na_x] <- is.na(a[na_x]) & is.na(b[na_x])
  x
}

在dplyr链中，可以通过以下方式方便地使用：
data.frame(a=c(1,NA,3),b=c(1,NA,4)) %>%
  mutate(a = na_comparable(a),
         c = a==b,
         d= a!=b)

#    a  b     c     d
# 1  1  1  TRUE FALSE
# 2 NA NA  TRUE FALSE
# 3  3  4 FALSE  TRUE

使用这种方法，如果您需要更新代码以解释以前不存在的NAs
，则可以使用单个na_Compariable
调用来设置您，而不是转换初始数据或将所有==
替换为%=
。
帮助文件（？“==”
）似乎对此相当坚定：缺失值（NA）和NaN值甚至被认为是不可比的，因此涉及它们的比较总是会导致NA
-但其他人可能会给你一个更好的答案。@akrun:Inf
可能是更好的选择。ifelse（is.NA（a），is.NA（b），a==b）
@A.Webb您赢得了最直观的award@A.Webb这并不能保证（a%==%b）=（b%=%a）
我们应该称它为mutt操作符，因为它在%
中有点相等，有点%，然后我们可以定义它，使它看起来像狗骨头<代码>%=%A稍微短一点的函数：f=函数（A，b）（is.na（A）和is.na（b））|（！is.na（eq@Frank，这太聪明了！-在解决方案中添加了选项我可以看出为什么您要用Inf
代替0
，但不能将任何值也替换为Inf
，并且在将NA
与Inf
进行比较时，您会得到TRUE
？（好的，这可能永远不会发生，只是说说而已……感谢基准测试，我喜欢我的解决方案的结果；-）@CathG看起来我需要用数据集中不可能出现的东西来替换它：-）akrun2@Moody\u mudscapper它取决于列/向量中NA元素的数量。检查这些set.seed（24）；v1我明白了，谢谢，调用的开销被复制一个更小的对象（如果对象小得多）所抵消。
replace(a, is.na(a), Inf)==replace(b, is.na(b), Inf)

replace(a, which(is.na(a)), Inf)==replace(b, which(is.na(b)), Inf)

a <- c( 1 , 2 , 3 )
b <- c( 1 , 2 , 4 )
akrun1()
#[1]  TRUE  TRUE FALSE

 a <- c( 1 , NA , 3 ) 
 b <- c( 1 , NA , 4 )
 akrun1()
 #[1]  TRUE  TRUE FALSE

 a <- c( 1 , NA , 3 ) 
 b <- c( 1 , 2 , 4 )
 akrun1()
#[1]  TRUE FALSE FALSE

set.seed(24)
a <- sample(c(1:10, NA), 1e6, replace=TRUE)
b <- sample(c(1:20, NA), 1e6, replace=TRUE)
akrun1 <- function() {replace(a, is.na(a), Inf)==replace(b, is.na(b), Inf)}
cathG <- function() {(!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))}
anthony <- function() {mapply(`%in%`, a, b)}
webb <- function() {ifelse(is.na(a),is.na(b),a==b)}
docend <- function() {replace(a, which(is.na(a)), Inf)==replace(b,
       which(is.na(b)), Inf)}

library(microbenchmark)
microbenchmark(akrun1(), cathG(), anthony(), webb(),docend(),
  unit='relative', times=20L)
#Unit: relative
#    expr        min         lq       mean     median         uq        max
#  akrun1()   3.050200   3.035625   3.007196   2.963916   2.977490   3.083658
#   cathG()   4.829972   4.893266   4.843585   4.790466   4.816472   4.939316
# anthony() 190.499027 224.389971 215.792965 217.647702 215.503308 212.356051
#    webb()  14.000363  14.366572  15.412527  14.095947  14.671741  19.735746
#  docend()   1.000000   1.000000   1.000000   1.000000   1.000000   1.000000
# neval cld
#    20 a  
#    20 a  
#    20 c
#    20 b 
#    20 a  

a <- c( 1 , 2 , 3 )
b <- c( 1 , 2 , 4 )
mapply(identical,a,b)
#[1]  TRUE  TRUE FALSE

a <- c( 1 , NA , 3 ) 
b <- c( 1 , NA , 4 )
mapply(identical,a,b)
#[1]  TRUE  TRUE FALSE

a <- c( 1 , NA , 3 ) 
b <- c( 1 , 2 , 4 )
mapply(identical,a,b)
#[1]  TRUE FALSE FALSE

mapply(FUN=function(x,y){isTRUE(all.equal(x,y))}, a, b)

a<-.3/3
b<-.1
mapply(FUN=function(x,y){isTRUE(all.equal(x,y))}, a, b)
#[1] TRUE

mapply(identical,a,b)
#[1] FALSE

`%==%` <- function(a,b){
  x <- a==b
  na_x <- which(is.na(x))
  x[na_x] <- is.na(a[na_x]) & is.na(b[na_x])
  x
}

a <- c( 1 , 2 , 3 )
b <- c( 1 , 2 , 4 )
a %==% b
# [1]  TRUE  TRUE FALSE

a <- c( 1 , NA , 3 ) 
b <- c( 1 , NA , 4 )
a %==% b
# [1]  TRUE  TRUE FALSE

a <- c( 1 , NA , 3 ) 
b <- c( 1 , 2 , 4 )
a %==% b
# [1]  TRUE FALSE FALSE

set.seed(24)
a <- sample(c(1:10, NA), 1e6, replace=TRUE)
b <- sample(c(1:20, NA), 1e6, replace=TRUE)
mm <- function(){
  x <- a==b
  na_x <- which(is.na(x))
  x[na_x] <- is.na(a[na_x]) & is.na(b[na_x])
  x
}
akrun1 <- function() {replace(a, is.na(a), Inf)==replace(b, is.na(b), Inf)}
cathG <- function() {(!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))}
docend <- function() {replace(a, which(is.na(a)), Inf)==replace(b, which(is.na(b)), Inf)}

library(microbenchmark)
microbenchmark(mm(),akrun1(),cathG(),docend(),
               unit='relative', times=100L)

# Unit: relative
#     expr      min       lq     mean   median       uq       max neval
#     mm() 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100
# akrun1() 1.667242 1.884185 1.815392 1.642581 1.765238 0.9973017   100
#  cathG() 2.447168 2.449597 2.118306 2.201346 2.358105 1.1421577   100
# docend() 1.683817 1.950970 1.756481 1.745400 2.007889 1.2264461   100

na_comparable      <- setClass("na_comparable", contains = "numeric")
`==.na_comparable` <- function(a,b){
  x <- unclass(a) == unclass(b) # inefficient but I don't know how to force the default `==`
  na_x <- which(is.na(x))
  x[na_x] <- is.na(a[na_x]) & is.na(b[na_x])
  x
}

`!=.na_comparable` <- Negate(`==.na_comparable`)

a <- na_comparable(a)
a == b
# [1]  TRUE  TRUE FALSE
b == a
# [1]  TRUE  TRUE FALSE
a != b
# [1] FALSE FALSE  TRUE
b != a
# [1] FALSE FALSE  TRUE

data.frame(a=c(1,NA,3),b=c(1,NA,4)) %>%
  mutate(a = na_comparable(a),
         c = a==b,
         d= a!=b)

#    a  b     c     d
# 1  1  1  TRUE FALSE
# 2 NA NA  TRUE FALSE
# 3  3  4 FALSE  TRUE