R 与data.tables的非联接_R_Data.table

R 与data.tables的非联接

R 与data.tables的非联接,r,data.table,R,Data.table,我有一个关于data.table的习语“非连接”的问题，它的灵感来自迭代器的。以下是一个例子： library(data.table) dt1 <- data.table(A1=letters[1:10], B1=sample(1:5,10, replace=TRUE)) dt2 <- data.table(A2=letters[c(1:5, 11:15)], B2=sample(1:5,10, replace=TRUE)) setkey(dt1, A1) setkey(dt2,

我有一个关于

data.table

的习语“非连接”的问题，它的灵感来自迭代器的。以下是一个例子：

library(data.table)

dt1 <- data.table(A1=letters[1:10], B1=sample(1:5,10, replace=TRUE))
dt2 <- data.table(A2=letters[c(1:5, 11:15)], B2=sample(1:5,10, replace=TRUE))

setkey(dt1, A1)
setkey(dt2, A2)

要查找

dt2

中哪些行在

dt1

中具有相同的键，请将

which

选项设置为

TRUE

：

> dt1[dt2, which=TRUE]
[1]  1  2  3  4  5 NA NA NA NA NA

马修在这篇文章中提出了一个“非连接”的成语

将

dt1

子集到那些索引未出现在

dt2

中的行。在我的带有

数据的机器上。表v1.7.1我得到一个错误：
Error in `[.default`(x[[s]], irows): only 0's may be mixed with negative subscripts

相反，使用选项nomatch=0
，“非连接”起作用
这是预期的行为吗？
据我所知，这是base R的一部分
# This works
(1:4)[c(-2,-3)]

# But this gives you the same error you described above
(1:4)[c(-2, -3, NA)]
# Error in (1:4)[c(-2, -3, NA)] : 
#   only 0's may be mixed with negative subscripts

文本错误消息表明它是预期行为
以下是我对为什么这是预期行为的最佳猜测：
从他们对待其他地方的NA
（例如，通常默认为NA.rm=FALSE
）的方式来看，R的设计师似乎认为NA
承载着重要的信息，并且不愿意在没有明确指示的情况下放弃它。（幸运的是，设置nomatch=0
为您提供了一种传递该指令的干净方法！）
在这种情况下，设计师的偏好可能解释了为什么NA
被接受为正索引，而不是负索引：
# Positive indexing: works, because the return value retains info about NA's
(1:4)[c(2,3,NA)]

# Negative indexing: doesn't work, because it can't easily retain such info
(1:4)[c(-2,-3,NA)]

data.table的1.7.3版中新增了：
新选项datatable.nomatch允许使用nomatch的默认值
已从NA更改为0
v1.8.3中的新增功能：
i上新的“！”前缀表示“notjoin”（又称“notwhere”），#1384。
DT[-DT[“a”，它=TRUE，nomatch=0]#旧的不加入习惯用法仍然有效
DT[！“a”]#相同的结果，现在首选。
DT[！J（6），…]！J==不连接
DT[！2:3，…]#！关于所有类型的i
DT[colA！=6L | colB！=23L，…]#多矢量扫描法
DT[！J（6L，23L）]#相同的结果，更快的二进制搜索
'!' 已使用而不是“-”：
*匹配“不加入”和“不在哪里”术语
*使用“-”，DT[-0]将返回DT而不是DT[0]，并且不会向后
相容的。带“！”，DT[！0]返回两个之前的DT（因为！0在
基本R）和此新功能之后。
*保留DT[+…]和DT[-…]以备将来使用
+1回答得不错！是的，它是从基地来的。使X[-Y]
语法表示“不加入”。同时which=TRUE，需要nomatch=0
。这种更改可能会有所帮助，但实际上并不适用于“notjoin”。还有很多事要做。很高兴看到有人阅读新闻：）刚刚添加到v1.8.3的是notjoin语法。在这种情况下，dt1[！dt2]。将添加详细的答案。。。
> dt1[-dt1[dt2, which=TRUE, nomatch=0]]
     A1 B1
[1,]  f  2
[2,]  g  3
[3,]  h  3
[4,]  i  2
[5,]  j  4

# This works
(1:4)[c(-2,-3)]

# But this gives you the same error you described above
(1:4)[c(-2, -3, NA)]
# Error in (1:4)[c(-2, -3, NA)] : 
#   only 0's may be mixed with negative subscripts

# Positive indexing: works, because the return value retains info about NA's
(1:4)[c(2,3,NA)]

# Negative indexing: doesn't work, because it can't easily retain such info
(1:4)[c(-2,-3,NA)]