R 在数据框的一行中查找唯一值_R

R 在数据框的一行中查找唯一值

R 在数据框的一行中查找唯一值,r,R,我有两个数据框，包含人们购买的产品的序列号，按购买数量排序。第一列是custId，接下来的5列是序列号，按购买的商品数量从左到右排序 df1 df2 我试图将它们合并为一组5个序列号，如下所示： id col1 col2 col3 col4 col5 1 1 4742 927 7889 1816 6686 2 2 4964 9295 9174 228 9470 3 3 5834 7758 5022 4361 9264 4 4 2802 9984 323 7543 7757

我有两个数据框，包含人们购买的产品的序列号，按购买数量排序。第一列是custId，接下来的5列是序列号，按购买的商品数量从左到右排序

df1

df2

我试图将它们合并为一组5个序列号，如下所示：

  id col1 col2 col3 col4 col5
1  1 4742  927 7889 1816 6686   
2  2 4964 9295 9174  228 9470
3  3 5834 7758 5022 4361 9264
4  4 2802 9984  323 7543 7757
5  5  179  198 3996 6801 7561
6  6 7755 1252 9684 9940 3451

有几个问题

1）如何在一行中查找唯一的值

2）如何维护整行的秩序

有什么建议吗

> dput(df1)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), col1 = c(4742, 
4964, 5834, 2802, 179, 7755, 6467, 8671, 2910, 150), col2 = c(927, 
9295, 7758, 9984, 198, 1252, 1664, 5242, 6995, 3875), col3 = c(7889, 
9174, NA, 323, 3996, 9684, 1150, 2973, 9948, 8598), col4 = c(NA, 
228, NA, NA, 6801, 9940, 854, 4744, 4006, 3196), col5 = c(NA, 
9470, NA, NA, 7561, NA, 4342, 1791, 286, 7425)), .Names = c("id", 
"col1", "col2", "col3", "col4", "col5"), row.names = c(NA, -10L
), class = "data.frame")
> dput(df2)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), col6 = c(1816, 
6141, 5659, 3210, 9213, 3451, 2440, 5706, 5281, 7110), col7 = c(6686, 
9728, 3931, 2488, 1372, 5646, 2641, 7851, 5581, 5775), col8 = c(NA, 
6981, 5022, 9939, 4374, 6069, 7525, 4927, 9767, 1331), col9 = c(NA, 
3089, 4361, 7543, 7962, NA, 7526, 4215, 9923, 9887), col10 = c(NA, 
5674, 9264, 7757, 4983, NA, 9996, 5886, 9546, 9419)), .Names = c("id", 
"col6", "col7", "col8", "col9", "col10"), row.names = c(NA, -10L
), class = "data.frame")

我认为这会奏效：

x <- cbind(df1[, -1], df2[, -1])
dups <- function(x) x[!duplicated(x)]
new.df <- data.frame(df1[, 1, drop=FALSE], 
    t(apply(x, 1, function(x) dups(na.omit(x))[1:5])))

colnames(new.df)[-1] <- colnames(df1[, -1])
new.df

这项工作：

df3 <- cbind(df1,df2[,-1])

subs <- function(x){
  temp <- df3[x,][!is.na(df3[x,])]
  temp2 <- 11-length(temp)
  temp <- c(temp,rep(NA,temp2))
  df3[x,] <<- temp
}

for(i in 1:nrow(df3)){
  subs(i)
}

final.df <- df3[,1:6]

> final.df
   id col1 col2 col3 col4 col5
1   1 4742  927 7889 1816 6686
2   2 4964 9295 9174  228 9470
3   3 5834 7758 5659 3931 5022
4   4 2802 9984  323 3210 2488
5   5  179  198 3996 6801 7561
6   6 7755 1252 9684 9940 3451
7   7 6467 1664 1150  854 4342
8   8 8671 5242 2973 4744 1791
9   9 2910 6995 9948 4006  286
10 10  150 3875 8598 3196 7425

df3
x <- cbind(df1[, -1], df2[, -1])
dups <- function(x) x[!duplicated(x)]
new.df <- data.frame(df1[, 1, drop=FALSE], 
    t(apply(x, 1, function(x) dups(na.omit(x))[1:5])))

colnames(new.df)[-1] <- colnames(df1[, -1])
new.df

   id col1 col2 col3 col4 col5
1   1 4742  927 7889 1816 6686
2   2 4964 9295 9174  228 9470
3   3 5834 7758 5659 3931 5022
4   4 2802 9984  323 3210 2488
5   5  179  198 3996 6801 7561
6   6 7755 1252 9684 9940 3451
7   7 6467 1664 1150  854 4342
8   8 8671 5242 2973 4744 1791
9   9 2910 6995 9948 4006  286
10 10  150 3875 8598 3196 7425

df3 <- cbind(df1,df2[,-1])

subs <- function(x){
  temp <- df3[x,][!is.na(df3[x,])]
  temp2 <- 11-length(temp)
  temp <- c(temp,rep(NA,temp2))
  df3[x,] <<- temp
}

for(i in 1:nrow(df3)){
  subs(i)
}

final.df <- df3[,1:6]

> final.df
   id col1 col2 col3 col4 col5
1   1 4742  927 7889 1816 6686
2   2 4964 9295 9174  228 9470
3   3 5834 7758 5659 3931 5022
4   4 2802 9984  323 3210 2488
5   5  179  198 3996 6801 7561
6   6 7755 1252 9684 9940 3451
7   7 6467 1664 1150  854 4342
8   8 8671 5242 2973 4744 1791
9   9 2910 6995 9948 4006  286
10 10  150 3875 8598 3196 7425