基于重复数据消除(如果在R中不是唯一的)
对于示例数据帧:基于重复数据消除(如果在R中不是唯一的),r,R,对于示例数据帧: df <- structure(list(postcode = c("ne34rt", "ne34rt", "ne34rt", "ne34rt", "cb12sd", "cb23ef", "cb23ef", "cb23ef", "cb46tf"), name = c("katie", "katie", "katie", "john",
df <- structure(list(postcode = c("ne34rt", "ne34rt", "ne34rt", "ne34rt",
"cb12sd", "cb23ef", "cb23ef", "cb23ef", "cb46tf"), name = c("katie",
"katie", "katie", "john", "lucie", "amy", "amy", "amy", "dawn"
), score = c(5L, 5L, 4L, 3L, 6L, 4L, 4L, 1L, 2L)), .Names = c("postcode",
"name", "score"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L), spec = structure(list(cols = structure(list(postcode = structure(list(), class = c("collector_character",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), score = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("postcode", "name", "score")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df有时您只想检查两列中的重复行。在这种情况下,您可以使用data.table的unique,在其中您可以输入应该是唯一的列组合
library(data.table)
dt <- setDT(df)
unique( dt, by = c("postcode", "name", "score") )
postcode name score
1: ne34rt katie 5
2: ne34rt katie 4
3: ne34rt john 3
4: cb12sd lucie 6
5: cb23ef amy 4
6: cb23ef amy 1
7: cb46tf dawn 2
unique( dt, by = c("postcode","name") )
postcode name score
1: ne34rt katie 5
2: ne34rt john 3
3: cb12sd lucie 6
4: cb23ef amy 4
5: cb46tf dawn 2
unique( dt, by = c("postcode") )
postcode name score
1: ne34rt katie 5
2: cb12sd lucie 6
3: cb23ef amy 4
4: cb46tf dawn 2
库(data.table)
dt我们可以使用duplicated
frombase R
来创建逻辑条件
df[!(duplicated(df)|duplicated(df, fromLast = TRUE)), ]
如果我们想使用列的子集筛选行,请对子集数据应用duplicated
nm1 <- colnames(df)[1:2]
df[!(duplicated(df[nm1])|duplicated(df[nm1], fromLast = TRUE)),]
nm1尝试df[!(重复(df)|重复(df,fromLast=TRUE)),]