Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中查找数据帧的分组变量的交集_R_Dataframe_Merge_Range_Overlap - Fatal编程技术网

在R中查找数据帧的分组变量的交集

在R中查找数据帧的分组变量的交集,r,dataframe,merge,range,overlap,R,Dataframe,Merge,Range,Overlap,我有这样一个数据框: df <- data.frame( names = c(rep("cody", 10), rep("sam", 5)), year = c(paste0("year",2000:2009), paste0("year",2000:2004)) ) df2 <- data.frame( names = c(rep("cody", 5), rep("sam", 5)), year = c(paste0("year",2000:2004), p

我有这样一个数据框:

df <- data.frame(
  names = c(rep("cody", 10), rep("sam", 5)),
  year  = c(paste0("year",2000:2009), paste0("year",2000:2004))
)
df2 <- data.frame(
  names = c(rep("cody", 5), rep("sam", 5)), 
  year  = c(paste0("year",2000:2004), paste0("year",2000:2004))
)

df这里是一个基本的R方法,使用
Reduce
intersect

dat[dat$year == Reduce(intersect, split(dat$year, dat$names)),]
返回

  names     year
1   cody year2000
2   cody year2001
3   cody year2002
4   cody year2003
5   cody year2004
11   sam year2000
12   sam year2001
13   sam year2002
14   sam year2003
15   sam year2004
在这里,我们使用
Reduce
反复将参数(使用
split
以列表形式提供的每个名称的单独年份)输入到
intersect
,这将消除“不匹配”的年份,直到您只得到可用于所有名称的年份

请注意,年份变量必须是字符向量,而不是因子变量

作为一个小简化,您可以使用
with
来减少
dat$
引用:

dat[with(dat, year == Reduce(intersect, split(year, names))),]
数据

dat <- 
structure(list(names = c("cody", "cody", "cody", "cody", "cody", 
"cody", "cody", "cody", "cody", "cody", "sam", "sam", "sam", 
"sam", "sam"), year = c("year2000", "year2001", "year2002", "year2003", 
"year2004", "year2005", "year2006", "year2007", "year2008", "year2009", 
"year2000", "year2001", "year2002", "year2003", "year2004")),
.Names = c("names", "year"), row.names = c(NA, -15L), class = "data.frame")

dat您可以按年份分组,然后过滤出现两次的年份(或您想要多少唯一名称):

库(dplyr)
df%>%
组别(年份)%>%
突变(name_count=n())%>%
解组()%>%
过滤器(名称\u计数==2)%>%
选择(-name\u count)
名称年份
1科迪2000年
2科迪年鉴2001
3科迪年鉴2002
4科迪年鉴2003
5科迪2004年
6.2000年
7.2001年
8.2002年
9.2003年
10.2004年

这里有一个选项,可以在
年份
列中查找所有重复项

df[duplicated(df$year) | duplicated(df$year, fromLast = TRUE), ]
#    names     year
# 1   cody year2000
# 2   cody year2001
# 3   cody year2002
# 4   cody year2003
# 5   cody year2004
# 11   sam year2000
# 12   sam year2001
# 13   sam year2002
# 14   sam year2003
# 15   sam year2004

创建数据帧时,请尝试推荐
stringsAsFactors=FALSE
df[duplicated(df$year) | duplicated(df$year, fromLast = TRUE), ]
#    names     year
# 1   cody year2000
# 2   cody year2001
# 3   cody year2002
# 4   cody year2003
# 5   cody year2004
# 11   sam year2000
# 12   sam year2001
# 13   sam year2002
# 14   sam year2003
# 15   sam year2004