Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何从列的每行中删除重复的字符?_R_Dataframe_Duplicates - Fatal编程技术网

R 如何从列的每行中删除重复的字符?

R 如何从列的每行中删除重复的字符?,r,dataframe,duplicates,R,Dataframe,Duplicates,如何使用R从列的字符串中删除重复字符? 例如,这是我的专栏: df<- data.frame(name = c(A="a,a,b,c,d,d,d", B="a,b,b,b,f", C="d,d,d,d", D="a,a")) df使用tidyverse我们可以首先添加行名作为列,将逗号分隔的字符串分隔成单独的行,按行名分组

如何使用R从列的字符串中删除重复字符? 例如,这是我的专栏:

df<- data.frame(name = c(A="a,a,b,c,d,d,d",
                            B="a,b,b,b,f",
                            C="d,d,d,d",
                            D="a,a"))

df使用
tidyverse
我们可以首先添加行名作为列,将逗号分隔的字符串分隔成
单独的行
行名
分组
并删除
重复的
值,然后使用
toString
再次将其转换为逗号分隔的字符串

library(tidyverse)

df %>%
  rownames_to_column() %>%
  separate_rows(name, sep = ",") %>%
  group_by(rowname) %>%
  filter(!duplicated(name)) %>%
  summarise(name = toString(name)) %>%
  column_to_rownames()

#        name
#A a, b, c, d
#B    a, b, f
#C          d
#D          a
使用与@tmfmnk完全相同的
sapply
的基本R方法

sapply(strsplit(as.character(df$name), ","), function(x) toString(unique(x)))
#[1] "a, b, c, d" "a, b, f"    "d"          "a" 

一种可能性是:

df %>%
 rowwise() %>%
 mutate(name = toString(unique(unlist(strsplit(name, ",")))))

  name      
  <chr>     
1 a, b, c, d
2 a, b, f   
3 d         
4 a 

带有
map
strsplit

library(tidyverse)
df %>%
   mutate(name = strsplit(as.character(name), ",") %>% 
   map(~toString(unique(.x))))
#        name
#1 a, b, c, d
#2    a, b, f
#3          d
#4          a

或者在带有正则表达式的
base R

sub(",$", "", gsub("([a-z],)\\1+", "\\1", paste0(df$name, ",")))
#[1] "a,b,c,d" "a,b,f"   "d"       "a" 
library(tidyverse)
df %>%
   mutate(name = strsplit(as.character(name), ",") %>% 
   map(~toString(unique(.x))))
#        name
#1 a, b, c, d
#2    a, b, f
#3          d
#4          a
sub(",$", "", gsub("([a-z],)\\1+", "\\1", paste0(df$name, ",")))
#[1] "a,b,c,d" "a,b,f"   "d"       "a"