R从每列中删除重复数据_R_Duplicates

R从每列中删除重复数据

R从每列中删除重复数据,r,duplicates,R,Duplicates,我得到的CSV的数百个不同的列，并希望能够输出一个新的文件与重复值删除从每列。我所看到和尝试过的每件事都使用一个特定的专栏。我只需要每个列都是唯一的值例如，我的数据： df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs.")) df A B C 1 1 1 Mr. 2 2 0 Mr. 3 3

我得到的CSV的数百个不同的列，并希望能够输出一个新的文件与重复值删除从每列。我所看到和尝试过的每件事都使用一个特定的专栏。我只需要每个列都是唯一的值

例如，我的数据：

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))
df
    A B    C
  1 1 1  Mr.
  2 2 0  Mr.
  3 3 1 Mrs.
  4 4 0 Miss
  5 5 0  Mr.
  6 6 1 Mrs.

然后我可以：

write.csv(df, file = file.path(df, "df_No_Dupes.csv"), na="")

所以我可以用它作为我下一个任务的参考

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))


for(i in 1:ncol(df)){
  assign(paste("df_",i,sep=""), unique(df[,i]))
}

require(rowr)
df <- cbind.fill(df_1,df_2,df_3, fill = NA)

如果要避免键入每个中间数据帧的名称，可以只使用

ls（pattern=“dfØ”）

和

获取该向量中命名的对象，或者使用另一个循环
如果要将列名更改回其原始值，可以使用：
colnames(output_df) <- colnames(input_df)

read.csv
和write.csv
最适合表格数据。您想要的输出并不是一个很好的例子（每一行的列数并不相同）
您可以使用
vals <- sapply(df, unique)

vals使用灵活列数、删除重复列和保留列名的代码段：
require(rowr)

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))

#get the number of columns in the dataframe
n <- ncol(df)

#loop through the columns
for(i in 1:ncol(df)){

  #replicate column i without duplicates, fill blanks with NAs
  df <-  cbind.fill(df,unique(df[,1]), fill = NA)
  #rename the new column
  colnames(df)[n+1] <- colnames(df)[1]
  #delete the old column
  df[,1] <- NULL
}

require（行）
df这适用于当前数据集，但我有时有100或更多列，所以键入df_1，df_2。。。这是行不通的。因此，在For循环之后，当我将每一列输出为值时，我可以运行另一个循环来获取从df_u开始的每一个值并合并到一个文件中吗？另外，如果标题可以是原始名称，那就太完美了。@Trigs是的，当然。您还可以使用ls（）
获取环境中具有特定模式的对象列表，即ls（pattern=“df”）
。如果您想更改colnames，只需colnames（output_df）看起来像是添加了另一列NA，因此所有内容都被右移。A列全部为NA。这也是一个巨大的帮助，所以谢谢你@我明白你的意思了。我如何创建对象的一个奇怪结果。我现在将添加一行来修复它。欢迎你的帮助。如果有帮助，请随意投票。
  V1 V1   V1
1  1  1  Mr.
2  2  0 Mrs.
3  3    Miss
4  4        
5  5        
6  6

colnames(output_df) <- colnames(input_df)

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))


for(i in 1:ncol(df)){
  assign(paste("df_",i,sep=""), unique(df[,i]))
}

require(rowr)
files     <- ls(pattern="df_")

df_output <- data.frame()
for(i in files){
  df_output <- cbind.fill(df_output, get(i), fill = "")
}

df_output <- df_output[,2:4] # fix extra colname from initialization
colnames(df_output) <- colnames(df)
write.csv(df_output, "df_out.csv",row.names = F)

verify_it_worked <- read.csv("df_out.csv")
verify_it_worked

  A  B    C
1 1  1  Mr.
2 2  0 Mrs.
3 3    Miss
4 4      
5 5      
6 6 

vals <- sapply(df, unique)

require(rowr)

df <- data.frame(A = c(1, 2, 3, 4, 5, 6), B = c(1, 0, 1, 0, 0, 1), C = c("Mr.","Mr.","Mrs.","Miss","Mr.","Mrs."))

#get the number of columns in the dataframe
n <- ncol(df)

#loop through the columns
for(i in 1:ncol(df)){

  #replicate column i without duplicates, fill blanks with NAs
  df <-  cbind.fill(df,unique(df[,1]), fill = NA)
  #rename the new column
  colnames(df)[n+1] <- colnames(df)[1]
  #delete the old column
  df[,1] <- NULL
}