基于阈值值组合数的子集data.frame_R_Dataframe

基于阈值值组合数的子集data.frame

r dataframe

基于阈值值组合数的子集data.frame,r,dataframe,R,Dataframe,我想从data.frame中删除在数据框中没有重复>=4次的唯一值组合的行。在本例中，我只需要第1、2、6和7行，因为值IR、IR_OSR、2和hello在本例中重复了4次 > DB[1:5,c("MegaSite","General.location","ID","call.type")] MegaSite General.location ID call.type 1 IR IR_OSR 2 hello 2 IR

我想从data.frame中删除在数据框中没有重复>=4次的唯一值组合的行。在本例中，我只需要第1、2、6和7行，因为值IR、IR_OSR、2和hello在本例中重复了4次

> DB[1:5,c("MegaSite","General.location","ID","call.type")]
  MegaSite General.location ID call.type
1       IR           IR_OSR  2     hello
2       IR           IR_OSR  2     hello
3       IR           IR_OSR  M         x
4       IR           IR_OSR  M         x
5       IR           IR_OSR  M         z
6       IR           IR_OSR  2     hello
7       IR           IR_OSR  2     hello
        > dim(DB)
[1] 25434    76

我尝试了最近另一个问题（）中建议的以下代码

但是我得到了这个错误

Error in drop && !has.j : invalid 'x' type in 'x && y'

这里有一个链接，指向一个更大的示例数据集，该数据集仅包含我的实际数据集中的相关列：请尝试以下代码：

> require(plyr)
> result <- ddply(r,.(MegaSite,General.location,ID,call.type),nrow)
> result <- result[result$V1 >= 4, ]
> result
  MegaSite General.location ID call.type V1
1       IR           IR_OSR  2     hello  4

谢谢，这很有效！尽管我应该注意到，我需要将合并应用到一个新对象，例如DB2

> require(plyr)
> result <- ddply(r,.(MegaSite,General.location,ID,call.type),nrow)
> result <- result[result$V1 >= 4, ]
> result
  MegaSite General.location ID call.type V1
1       IR           IR_OSR  2     hello  4

> merge(r, result, all.y=TRUE, by=c("MegaSite", "General.location", "ID", "call.type"))
  MegaSite General.location ID call.type V1
1       IR           IR_OSR  2     hello  4
2       IR           IR_OSR  2     hello  4
3       IR           IR_OSR  2     hello  4
4       IR           IR_OSR  2     hello  4