Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/magento/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R删除不完全重复的重复记录_R_Duplicates_Data Cleaning - Fatal编程技术网

R删除不完全重复的重复记录

R删除不完全重复的重复记录,r,duplicates,data-cleaning,R,Duplicates,Data Cleaning,我有一个需要重复数据消除的记录列表,这些记录看起来像是同一组记录的组合,但使用常规函数来重复数据消除记录不起作用,因为这两列不是重复的。下面是一个可复制的示例 df <- data.frame( A = c("2","2","2","43","43","43","331","391","481","490","501","501","501","502","502","502"), B = c("43","501","502","2","501","502","

我有一个需要重复数据消除的记录列表,这些记录看起来像是同一组记录的组合,但使用常规函数来重复数据消除记录不起作用,因为这两列不是重复的。下面是一个可复制的示例

df <- data.frame( A  =  c("2","2","2","43","43","43","331","391","481","490","501","501","501","502","502","502"),

          B =  c("43","501","502","2","501","502","491","496","490","481","2","43","502","2","43","501"))
下面是我想要的输出

df_Final <- data.frame( A  =  c("2","2","2","331","391","481"),

          B =  c("43","501","502","491","496","490"))

您可以删除在使用重新排序时重复的所有行

require(dplyr)
df %>%
    apply(1, sort) %>% t %>% 
    data.frame %>% 
    group_by_all %>% 
    slice(1)

我想你想知道A列中的元素何时第一次出现在B列中

如果A中的元素不在B is.naidx中,或者A中的元素在B seq_alongidx 也许一种或多或少的文字tidyverse方法是创建并删除一个临时列

library(tidyverse)
df %>% mutate(idx = match(A, B)) %>%
    filter(is.na(idx) | seq_along(idx) < idx) %>%
    select(-idx)

您的输入和预期输出之间没有明显的联系。例如,A=43个条目会发生什么变化?虽然很明显您想要消除重复数据,但其背后的逻辑肯定不是直观的,也不容易从数据中推断出来。如果没有明确定义的规则,那么可以逐行检查输入数据,并解释为什么保留或丢弃该行。决定哪个向量保留该值的规则是什么?为什么2属于A,43属于B?
df[is.na(idx) | seq_along(idx) < idx,]
library(tidyverse)
df %>% mutate(idx = match(A, B)) %>%
    filter(is.na(idx) | seq_along(idx) < idx) %>%
    select(-idx)