删除r中每行中的类似国家/地区名称_R_Duplicates

删除r中每行中的类似国家/地区名称
删除r中每行中的类似国家/地区名称,r,duplicates,R,Duplicates,我有一个数据集，其中有一个示例列，如下所示我需要删除每行中类似的国家名称（主请求）然后我需要为每个国家创建一个列（补充请求） data您可以尝试以下方法： library(dplyr) library(tidyr) #Code1 data %>% mutate(id=row_number()) %>% separate_rows(LocationCountry,sep=',') %>% mutate(LocationCountry=trimws(Location
我有一个数据集，其中有一个示例列，如下所示
我需要删除每行中类似的国家名称（主请求）
然后我需要为每个国家创建一个列（补充请求）
data您可以尝试以下方法：
library(dplyr)
library(tidyr)
#Code1
data %>%
  mutate(id=row_number()) %>%
  separate_rows(LocationCountry,sep=',') %>%
  mutate(LocationCountry=trimws(LocationCountry)) %>%
  group_by(id) %>%
  filter(!duplicated(LocationCountry)) %>%
  summarise(LocationCountry=paste0(LocationCountry,collapse = ', ')) %>%
  select(-id)

输出：
# A tibble: 12 x 1
   LocationCountry                                             
   <chr>                                                       
 1 United States, Belgium, France, Ireland, Netherlands, Sweden
 2 Spain                                                       
 3 Korea, Republic of                                          
 4 Korea, Republic of                                          
 5 Austria                                                     
 6 United States                                               
 7 Italy                                                       
 8 Korea, Republic of                                          
 9 India, Iran, Islamic Republic of                            
10 Spain                                                       
11 Korea, Republic of                                          
12 Turkey                                                      

#一个tible:12 x 1
所在国
1美国、比利时、法国、爱尔兰、荷兰、瑞典
2西班牙
3大韩民国、大韩民国
4大韩民国、大韩民国
5奥地利
6美国
7意大利
8大韩民国、大韩民国
9印度、伊朗、伊朗伊斯兰共和国
10西班牙
11大韩民国
12土耳其
您可以尝试以下方法：
library(dplyr)
library(tidyr)
#Code1
data %>%
  mutate(id=row_number()) %>%
  separate_rows(LocationCountry,sep=',') %>%
  mutate(LocationCountry=trimws(LocationCountry)) %>%
  group_by(id) %>%
  filter(!duplicated(LocationCountry)) %>%
  summarise(LocationCountry=paste0(LocationCountry,collapse = ', ')) %>%
  select(-id)

输出：
# A tibble: 12 x 1
   LocationCountry                                             
   <chr>                                                       
 1 United States, Belgium, France, Ireland, Netherlands, Sweden
 2 Spain                                                       
 3 Korea, Republic of                                          
 4 Korea, Republic of                                          
 5 Austria                                                     
 6 United States                                               
 7 Italy                                                       
 8 Korea, Republic of                                          
 9 India, Iran, Islamic Republic of                            
10 Spain                                                       
11 Korea, Republic of                                          
12 Turkey                                                      

#一个tible:12 x 1
所在国
1美国、比利时、法国、爱尔兰、荷兰、瑞典
2西班牙
3大韩民国、大韩民国
4大韩民国、大韩民国
5奥地利
6美国
7意大利
8大韩民国、大韩民国
9印度、伊朗、伊朗伊斯兰共和国
10西班牙
11大韩民国
12土耳其
在base R
中，我们可以使用strsplit
将其拆分为列表
，获取唯一的
元素并粘贴

data$LocationCountry <- sapply(strsplit(data$LocationCountry, ",\\s*"), 
       function(x) toString(unique(x)))


对于补充部分，如果我们需要为“LocationCountry”中的每个元素创建二进制列，则使用具有唯一名称的更新的“LocationCountry”列，将其拆分，并应用mtabulate

library(qdapTools)
cbind(data, mtabulate(strsplit(data$LocationCountry, ",\\s+")))

-输出
data
#                                                LocationCountry
#1  United States, Belgium, France, Ireland, Netherlands, Sweden
#2                                                         Spain
#3                                            Korea, Republic of
#4                                            Korea, Republic of
#5                                                       Austria
#6                                                 United States
#7                                                         Italy
#8                                            Korea, Republic of
#9                              India, Iran, Islamic Republic of
#10                                                        Spain
#11                                           Korea, Republic of
#12                                                       Turkey

                                             LocationCountry Austria Belgium France India Iran Ireland Islamic Republic of Italy
1  United States, Belgium, France, Ireland, Netherlands, Sweden       0       1      1     0    0       1                   0     0
2                                                         Spain       0       0      0     0    0       0                   0     0
3                                            Korea, Republic of       0       0      0     0    0       0                   0     0
4                                            Korea, Republic of       0       0      0     0    0       0                   0     0
5                                                       Austria       1       0      0     0    0       0                   0     0
6                                                 United States       0       0      0     0    0       0                   0     0
7                                                         Italy       0       0      0     0    0       0                   0     1
8                                            Korea, Republic of       0       0      0     0    0       0                   0     0
9                              India, Iran, Islamic Republic of       0       0      0     1    1       0                   1     0
10                                                        Spain       0       0      0     0    0       0                   0     0
11                                           Korea, Republic of       0       0      0     0    0       0                   0     0
12                                                       Turkey       0       0      0     0    0       0                   0     0
   Korea Netherlands Republic of Spain Sweden Turkey United States
1      0           1           0     0      1      0             1
2      0           0           0     1      0      0             0
3      1           0           1     0      0      0             0
4      1           0           1     0      0      0             0
5      0           0           0     0      0      0             0
6      0           0           0     0      0      0             1
7      0           0           0     0      0      0             0
8      1           0           1     0      0      0             0
9      0           0           0     0      0      0             0
10     0           0           0     1      0      0             0
11     1           0           1     0      0      0             0
12     0           0           0     0      0      1             0

在base R
中，我们可以使用strsplit
将其拆分为列表
，获取唯一的
元素并粘贴它们回来
data$LocationCountry <- sapply(strsplit(data$LocationCountry, ",\\s*"), 
       function(x) toString(unique(x)))


对于补充部分，如果我们需要为“LocationCountry”中的每个元素创建二进制列，则使用具有唯一名称的更新的“LocationCountry”列，将其拆分，并应用mtabulate

library(qdapTools)
cbind(data, mtabulate(strsplit(data$LocationCountry, ",\\s+")))

-输出
data
#                                                LocationCountry
#1  United States, Belgium, France, Ireland, Netherlands, Sweden
#2                                                         Spain
#3                                            Korea, Republic of
#4                                            Korea, Republic of
#5                                                       Austria
#6                                                 United States
#7                                                         Italy
#8                                            Korea, Republic of
#9                              India, Iran, Islamic Republic of
#10                                                        Spain
#11                                           Korea, Republic of
#12                                                       Turkey

                                             LocationCountry Austria Belgium France India Iran Ireland Islamic Republic of Italy
1  United States, Belgium, France, Ireland, Netherlands, Sweden       0       1      1     0    0       1                   0     0
2                                                         Spain       0       0      0     0    0       0                   0     0
3                                            Korea, Republic of       0       0      0     0    0       0                   0     0
4                                            Korea, Republic of       0       0      0     0    0       0                   0     0
5                                                       Austria       1       0      0     0    0       0                   0     0
6                                                 United States       0       0      0     0    0       0                   0     0
7                                                         Italy       0       0      0     0    0       0                   0     1
8                                            Korea, Republic of       0       0      0     0    0       0                   0     0
9                              India, Iran, Islamic Republic of       0       0      0     1    1       0                   1     0
10                                                        Spain       0       0      0     0    0       0                   0     0
11                                           Korea, Republic of       0       0      0     0    0       0                   0     0
12                                                       Turkey       0       0      0     0    0       0                   0     0
   Korea Netherlands Republic of Spain Sweden Turkey United States
1      0           1           0     0      1      0             1
2      0           0           0     1      0      0             0
3      1           0           1     0      0      0             0
4      1           0           1     0      0      0             0
5      0           0           0     0      0      0             0
6      0           0           0     0      0      0             1
7      0           0           0     0      0      0             0
8      1           0           1     0      0      0             0
9      0           0           0     0      0      0             0
10     0           0           0     1      0      0             0
11     1           0           1     0      0      0             0
12     0           0           0     0      0      1             0

好的，@akrun提出的观点使一些国家之间有逗号变得复杂。然而，这里是我的解决方案。与@akrun的区别不大
RemoveRedundants <- function(Row){
  Split_Countries <- unlist(strsplit(Row, ", "))
  Unique_Countries <- paste(unique(Split_Countries, fromLast = TRUE), collapse = ", ")
  return(Unique_Countries)
}

data$UniqueCountries <- apply(data,1,RemoveRedundants)
View(data)

removeredundats好的@akrun提出的观点使一些国家之间用逗号变得复杂。然而，这里是我的解决方案。与@akrun的区别不大
RemoveRedundants <- function(Row){
  Split_Countries <- unlist(strsplit(Row, ", "))
  Unique_Countries <- paste(unique(Split_Countries, fromLast = TRUE), collapse = ", ")
  return(Unique_Countries)
}

data$UniqueCountries <- apply(data,1,RemoveRedundants)
View(data)

RemovedAnd第一部分非常简单。第二部分令人困惑。你的意思是在不同的数据框架中为每个国家创建一列？@Amit谢谢。如果可能的话，我需要它在同一个数据集中。否则，我的大数据集中的每一行都有一个序列号，因此如果需要，我可以合并。一些国家有，
，介于两者之间，即韩国、大韩民国
第一部分非常简单。第二部分令人困惑。你的意思是在不同的数据框架中为每个国家创建一列？@Amit谢谢。如果可能的话，我需要它在同一个数据集中。否则，我的大数据集中的每一行都有一个序列号，因此如果需要，我可以合并。一些国家之间有，
，即韩国，大韩民国谢谢您宝贵的代码输入数据$UniqueCountries 1是空白。如果是1，则表示您希望为每一行应用函数。如果是2，则表示您希望为每一列应用函数。我想补充一下你需要什么。。对于每个独特的国家，您需要一个列，以便这些新列中的每个单元格稍后都将通过某种逻辑填充？对吧？太多了。是的，我需要一个名为India的列，如果该行有India，则为1，如果没有，则为0，等等。对于其他国家/地区，我需要一个名为India的列。非常感谢您在代码数据中的宝贵输入$UniqueCountries 1是边距。如果是1，则表示您希望为每一行应用函数。如果是2，则表示您希望为每一列应用函数。我想补充一下你需要什么。。对于每个独特的国家，您需要一个列，以便这些新列中的每个单元格稍后都将通过某种逻辑填充？对吧？太多了。是的，我需要一个名为India的列，如果该行有India，则为1，如果没有，则为0，等等。对于其他国家/地区。Thx供您反馈。我试过了，但它在strsplit（data$LocationCountry，“，\\s*”）中给了我这个错误：非字符参数
。此外，如果可能的话，是否有可能帮助处理上述补充请求？Upvote@MohamedRahouma好的，原因是您有factor
列。转换为character
即默认情况下从R4.0转换为sapply（strsplit（如.character（数据$LocationCountry），“，\\s*”），函数（x）到字符串（唯一（x））
，stringsAsFactors=FALSE
。也许你有一个旧的R版本。是的，现在可以用了。我有R版本3.6.2
。可否就补充质询提供协助？。向上投票。@MohamedRahouma你能试试更新的溶质吗