Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于重复数据消除(如果在R中不是唯一的)_R - Fatal编程技术网

基于重复数据消除(如果在R中不是唯一的)

基于重复数据消除(如果在R中不是唯一的),r,R,对于示例数据帧: df <- structure(list(postcode = c("ne34rt", "ne34rt", "ne34rt", "ne34rt", "cb12sd", "cb23ef", "cb23ef", "cb23ef", "cb46tf"), name = c("katie", "katie", "katie", "john",

对于示例数据帧:

df <- structure(list(postcode = c("ne34rt", "ne34rt", "ne34rt", "ne34rt", 
                                  "cb12sd", "cb23ef", "cb23ef", "cb23ef", "cb46tf"), name = c("katie", 
                                  "katie", "katie", "john", "lucie", "amy", "amy", "amy", "dawn"
                                  ), score = c(5L, 5L, 4L, 3L, 6L, 4L, 4L, 1L, 2L)), .Names = c("postcode", 
                                  "name", "score"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                  -9L), spec = structure(list(cols = structure(list(postcode = structure(list(), class = c("collector_character", 
                                    "collector")), name = structure(list(), class = c("collector_character", 
                                "collector")), score = structure(list(), class = c("collector_integer", 
                             "collector"))), .Names = c("postcode", "name", "score")), default = structure(list(), class = c("collector_guess", 
                          "collector"))), .Names = c("cols", "default"), class = "col_spec"))

df有时您只想检查两列中的重复行。在这种情况下,您可以使用data.table的unique,在其中您可以输入应该是唯一的列组合

library(data.table)
dt <- setDT(df)
unique( dt, by = c("postcode", "name", "score") )

   postcode  name score
1:   ne34rt katie     5
2:   ne34rt katie     4
3:   ne34rt  john     3
4:   cb12sd lucie     6
5:   cb23ef   amy     4
6:   cb23ef   amy     1
7:   cb46tf  dawn     2

unique( dt, by = c("postcode","name") )

   postcode  name score
1:   ne34rt katie     5
2:   ne34rt  john     3
3:   cb12sd lucie     6
4:   cb23ef   amy     4
5:   cb46tf  dawn     2


unique( dt, by = c("postcode") )

   postcode  name score
1:   ne34rt katie     5
2:   cb12sd lucie     6
3:   cb23ef   amy     4
4:   cb46tf  dawn     2
库(data.table)

dt我们可以使用
duplicated
from
base R
来创建逻辑条件

df[!(duplicated(df)|duplicated(df, fromLast = TRUE)), ]
如果我们想使用列的子集筛选行,请对子集数据应用
duplicated

nm1 <- colnames(df)[1:2]
df[!(duplicated(df[nm1])|duplicated(df[nm1], fromLast = TRUE)),]
nm1尝试
df[!(重复(df)|重复(df,fromLast=TRUE)),]