莫名其妙；下标越界“；dplyr:：filter（）中出错_R_Dplyr

莫名其妙；下标越界“；dplyr:：filter（）中出错

莫名其妙；下标越界“；dplyr:：filter（）中出错,r,dplyr,R,Dplyr,这里有两个小的tbl_df对象：第一个df，具有与某些交易相关联的数字客户ID；第二个，队列，是一个具有客户ID的单列对象，在我的函数中的某个点上，我需要识别并保留： > df Source: local data frame [7 x 4] cust date sales cohort 1 12 2013-07-31 35 2013-07-01 2 13 2013-12-16 70 2013-12-01 3 14 2014-03-14

这里有两个小的

tbl_df

对象：第一个

df

，具有与某些交易相关联的数字客户ID；第二个，

队列

，是一个具有客户ID的单列对象，在我的函数中的某个点上，我需要识别并保留：

> df
Source: local data frame [7 x 4]

  cust       date sales     cohort
1   12 2013-07-31    35 2013-07-01
2   13 2013-12-16    70 2013-12-01
3   14 2014-03-14    59 2014-03-01
4   15 2014-04-22    70 2014-04-01
5    9 2012-10-29    35 2012-10-01
6   10 2012-11-12    35 2012-11-01
7   11 2012-12-06   105 2012-12-01
> cohort
Source: local data frame [1 x 1]

  cust
1    9
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   7 obs. of  4 variables:
 $ cust  : num  12 13 14 15 9 10 11
 $ date  : POSIXct, format: "2013-07-31" "2013-12-16" "2014-03-14" ...
 $ sales : num  35 70 59 70 35 35 105
 $ cohort: Date, format: "2013-07-01" "2013-12-01" "2014-03-01" ...
> str(cohort)
Classes ‘tbl_df’ and 'data.frame':  1 obs. of  1 variable:
 $ cust: num 9

为此，我想我应该使用

dplyr:：filter（）

，如下所示：

> filter(data.frame(df), cust %in% cohort[['cust']])
Error: subscript out of bounds

这尤其奇怪，因为修复似乎很容易：

> foo <- cohort
> filter(data.frame(df), cust %in% foo[['cust']])
  cust       date sales     cohort
1    9 2012-10-29    35 2012-10-01

有人有解释吗

我在

r3.2.1

上运行

dplyr0.4.2

。如果您想在您的机器上复制此信息，请参见以下内容：

> dput(df)
structure(list(cust = c(12, 13, 14, 15, 9, 10, 11), date = structure(c(1375228800, 
1387152000, 1394755200, 1398124800, 1351468800, 1352678400, 1354752000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), sales = c(35, 
70, 59, 70, 35, 35, 105), cohort = structure(c(15887, 16040, 
16130, 16161, 15614, 15645, 15675), class = "Date")), .Names = c("cust", 
"date", "sales", "cohort"), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(cohort)
structure(list(cust = 9), .Names = "cust", class = c("tbl_df", 
"data.frame"), row.names = c(NA, -1L))

你有一个叫做队列的变量，过滤器首先在变量中查找，然后在全局环境中查找。我有一半的预期这种模糊性会在某个时候产生影响。非常感谢。现在我想到，

internal\u join（）

也可以完成这项工作。有什么理由喜欢其中一个而不是另一个吗？

internal\u join

的要点是将两个数据帧连接在一起，因此我更喜欢这里，因为它更清楚地表达了您的意图。

> dput(df)
structure(list(cust = c(12, 13, 14, 15, 9, 10, 11), date = structure(c(1375228800, 
1387152000, 1394755200, 1398124800, 1351468800, 1352678400, 1354752000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), sales = c(35, 
70, 59, 70, 35, 35, 105), cohort = structure(c(15887, 16040, 
16130, 16161, 15614, 15645, 15675), class = "Date")), .Names = c("cust", 
"date", "sales", "cohort"), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(cohort)
structure(list(cust = 9), .Names = "cust", class = c("tbl_df", 
"data.frame"), row.names = c(NA, -1L))