删除R data.table中的错误代码和关联记录
我在R中有一个data.table,比如说dt,它看起来像:删除R data.table中的错误代码和关联记录,r,dataframe,data.table,R,Dataframe,Data.table,我在R中有一个data.table,比如说dt,它看起来像: > dt <- data.table(adr = c("A", "A", "A","A","A","A","A","B", "B", "C", "C", "C", "D", "E", "E"), code=c("0001","0001","0001","0001","0001","0001","0001","0001","0001", "0002", "0002", "0002", "0
> dt <- data.table(adr = c("A", "A", "A","A","A","A","A","B", "B", "C", "C", "C", "D", "E", "E"),
code=c("0001","0001","0001","0001","0001","0001","0001","0001","0001", "0002", "0002", "0002", "0003", "0003", "0003"),
num = c(1,67,875,467,986,34,987,876,785, 67,9078,45,907,451,987))
> dt
adr code num
1: A 0001 1
2: A 0001 67
3: A 0001 875
4: A 0001 467
5: A 0001 986
6: A 0001 34
7: A 0001 987
8: B 0001 876
9: B 0001 785
10: C 0002 67
11: C 0002 9078
12: C 0002 45
13: D 0003 907
14: E 0003 451
15: E 0003 987
如何在R中使用data.table实现这一点,我将您的
dt
设置为data.frame()
而不是data.table()
,这样我就不必加载其他包了,但您可以按如下方式完成:
require(dplyr)
dt <- dt %>% group_by(code, adr) %>% mutate(count = n()) %>% group_by(code) %>% filter(count == max(count)) %>% select(-count)
require(dplyr)
dt%分组依据(代码,adr)%%>%mutate(count=n())%%>%groupby(代码)%%>%filter(count==max(count))%%>%select(-count)
dt=dt[dt$adr!=“B”,如果大多数adr出现平局,或者大多数adr出现不超过50%,会发生什么情况?如果出现平局,第一个是正确的数据。表
版本:dt2 0.5*.N],by=code][,N:=NULL]
给出结果,错误的记录仍然存在:@Jack如果你想要的是模式(最常见的情况),这就可以做到:dt[,{uadr=unique(adr);.SD[which(adr==uadr[which.max(tablate(match(adr,uadr)))],]},by=c('code')]
。如果有两个以上的备选地址,可能会有所帮助。
require(dplyr)
dt <- dt %>% group_by(code, adr) %>% mutate(count = n()) %>% group_by(code) %>% filter(count == max(count)) %>% select(-count)