清除R dataframe中跨列的重复项
我有一个dataframe,它包含跨三列的副本:清除R dataframe中跨列的重复项,r,dataframe,R,Dataframe,我有一个dataframe,它包含跨三列的副本: Name Year Job1 Job2 Job3 Bob 2011 director director chair Bob 2012 director chair Wendy 2011 advisor chair advisor Henry 2010 CEO president president
Name Year Job1 Job2 Job3
Bob 2011 director director chair
Bob 2012 director chair
Wendy 2011 advisor chair advisor
Henry 2010 CEO president president
我想删除每行“job1”、“job2”和“job3”列中的重复项:
Name Year Job1 Job2 Job3
Bob 2011 director NA chair
Bob 2012 director chair
Wendy 2011 advisor chair NA
Henry 2010 CEO president NA
基本上,如果存在重复项,前一列中的值将保持不变,后一列中的值将被删除(例如,如果在“job1”和“job2”之间存在重复项,“job1”中的值将保持不变)。我们可以按行循环“Job”列,并将重复项替换为
NA
nm1 <- grep('^Job\\d+$', names(df1))
df1[nm1] <- t(apply(df1[nm1], 1, function(x) replace(x, duplicated(x), NA)))
nm1
df1
# Name Year Job1 Job2 Job3
#1 Bob 2011 director <NA> chair
#2 Bob 2012 director chair
#3 Wendy 2011 advisor chair <NA>
#4 Henry 2010 CEO president <NA>
df1 <- structure(list(Name = c("Bob", "Bob", "Wendy", "Henry"), Year = c(2011L,
2012L, 2011L, 2010L), Job1 = c("director", "director", "advisor",
"CEO"), Job2 = c("director", "chair", "chair", "president"),
Job3 = c("chair", "", "advisor", "president")),
class = "data.frame", row.names = c(NA,
-4L))