r – 与数据库非连接

我有一个关于“非连接”的data.table成语的问题,灵感来自Iterator的 question.这里有一个例子：

library(data.table)

dt1 <- data.table(A1=letters[1:10],B1=sample(1:5,10,replace=TRUE))
dt2 <- data.table(A2=letters[c(1:5,11:15)],B2=sample(1:5,replace=TRUE))

setkey(dt1,A1)
setkey(dt2,A2)

data.tables看起来像这样

> dt1               > dt2
      A1 B1               A2 B2
 [1,]  a  1          [1,]  a  2
 [2,]  b  4          [2,]  b  5
 [3,]  c  2          [3,]  c  2
 [4,]  d  5          [4,]  d  1
 [5,]  e  1          [5,]  e  1
 [6,]  f  2          [6,]  k  5
 [7,]  g  3          [7,]  l  2
 [8,]  h  3          [8,]  m  4
 [9,]  i  2          [9,]  n  1
[10,]  j  4         [10,]  o  1

要查找dt2中的哪些行在dt1中具有相同的键,请将该选项设置为TRUE：

> dt1[dt2,which=TRUE]
[1]  1  2  3  4  5 NA NA NA NA NA

马修在answer年提出,这个“不加入”的成语

dt1[-dt1[dt2,which=TRUE]]

将dt1子集到具有不出现在dt2中的索引的那些行.在我的机器上与data.table v1.7.1我得到一个错误：

Error in `[.default`(x[[s]],irows): only 0's may be mixed with negative subscripts

相反,选项nomatch = 0,“非连接”工作

> dt1[-dt1[dt2,which=TRUE,nomatch=0]]
     A1 B1
[1,]  f  2
[2,]  g  3
[3,]  h  3
[4,]  i  2
[5,]  j  4

这是有意的行为吗？

解决方法

据我所知,这是R的一部分

# This works
(1:4)[c(-2,-3)]

# But this gives you the same error you described above
(1:4)[c(-2,-3,NA)]
# Error in (1:4)[c(-2,NA)] : 
#   only 0's may be mixed with negative subscripts

文本错误消息表明它是预期的行为.

这是我最好的猜测,为什么这是预期的行为：

从他们处理NA的方式(例如通常默认为na.rm = FALSE),似乎R的设计者将NA视为携带重要信息,并且不愿意删除这些信息,而没有明确的指示. (幸运的是,设置nomatch = 0给你一个干净的方式通过该指令！)

在这种情况下,设计者的偏好可能解释了为什么NA被接受为正数索引,而不是负的索引：

# Positive indexing: works,because the return value retains info about NA's
(1:4)[c(2,3,NA)]

# Negative indexing: doesn't work,because it can't easily retain such info
(1:4)[c(-2,NA)]

r – 与数据库非连接

解决方法

猜你在找的MsSQL相关文章