莫名其妙;下标越界“;dplyr::filter()中出错

莫名其妙;下标越界“;dplyr::filter()中出错,r,dplyr,R,Dplyr,这里有两个小的tbl_df对象:第一个df,具有与某些交易相关联的数字客户ID;第二个,队列,是一个具有客户ID的单列对象,在我的函数中的某个点上,我需要识别并保留: > df Source: local data frame [7 x 4] cust date sales cohort 1 12 2013-07-31 35 2013-07-01 2 13 2013-12-16 70 2013-12-01 3 14 2014-03-14

这里有两个小的
tbl_df
对象:第一个
df
,具有与某些交易相关联的数字客户ID;第二个,
队列
,是一个具有客户ID的单列对象,在我的函数中的某个点上,我需要识别并保留:

> df
Source: local data frame [7 x 4]

  cust       date sales     cohort
1   12 2013-07-31    35 2013-07-01
2   13 2013-12-16    70 2013-12-01
3   14 2014-03-14    59 2014-03-01
4   15 2014-04-22    70 2014-04-01
5    9 2012-10-29    35 2012-10-01
6   10 2012-11-12    35 2012-11-01
7   11 2012-12-06   105 2012-12-01
> cohort
Source: local data frame [1 x 1]

  cust
1    9
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   7 obs. of  4 variables:
 $ cust  : num  12 13 14 15 9 10 11
 $ date  : POSIXct, format: "2013-07-31" "2013-12-16" "2014-03-14" ...
 $ sales : num  35 70 59 70 35 35 105
 $ cohort: Date, format: "2013-07-01" "2013-12-01" "2014-03-01" ...
> str(cohort)
Classes ‘tbl_df’ and 'data.frame':  1 obs. of  1 variable:
 $ cust: num 9
为此,我想我应该使用
dplyr::filter()
,如下所示:

> filter(data.frame(df), cust %in% cohort[['cust']])
Error: subscript out of bounds
这尤其奇怪,因为修复似乎很容易:

> foo <- cohort
> filter(data.frame(df), cust %in% foo[['cust']])
  cust       date sales     cohort
1    9 2012-10-29    35 2012-10-01  
有人有解释吗

我在
r3.2.1
上运行
dplyr0.4.2
。如果您想在您的机器上复制此信息,请参见以下内容:

> dput(df)
structure(list(cust = c(12, 13, 14, 15, 9, 10, 11), date = structure(c(1375228800, 
1387152000, 1394755200, 1398124800, 1351468800, 1352678400, 1354752000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), sales = c(35, 
70, 59, 70, 35, 35, 105), cohort = structure(c(15887, 16040, 
16130, 16161, 15614, 15645, 15675), class = "Date")), .Names = c("cust", 
"date", "sales", "cohort"), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(cohort)
structure(list(cust = 9), .Names = "cust", class = c("tbl_df", 
"data.frame"), row.names = c(NA, -1L))

你有一个叫做队列的变量,过滤器首先在变量中查找,然后在全局环境中查找。我有一半的预期这种模糊性会在某个时候产生影响。非常感谢。现在我想到,
internal\u join()
也可以完成这项工作。有什么理由喜欢其中一个而不是另一个吗?
internal\u join
的要点是将两个数据帧连接在一起,因此我更喜欢这里,因为它更清楚地表达了您的意图。
> dput(df)
structure(list(cust = c(12, 13, 14, 15, 9, 10, 11), date = structure(c(1375228800, 
1387152000, 1394755200, 1398124800, 1351468800, 1352678400, 1354752000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), sales = c(35, 
70, 59, 70, 35, 35, 105), cohort = structure(c(15887, 16040, 
16130, 16161, 15614, 15645, 15675), class = "Date")), .Names = c("cust", 
"date", "sales", "cohort"), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(cohort)
structure(list(cust = 9), .Names = "cust", class = c("tbl_df", 
"data.frame"), row.names = c(NA, -1L))