Regex R-拆分字符向量,以便将每个唯一元素添加到新的字符向量中

Regex R-拆分字符向量,以便将每个唯一元素添加到新的字符向量中,regex,r,vector,strsplit,Regex,R,Vector,Strsplit,我有一个字符向量,其中单个元素包含多个由逗号分隔的字符串。我从一个数据帧中提取了这个列表,它看起来像这样: [1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth" [2] "Ferncroft, Passaconaway, Paugus Mill"

我有一个字符向量,其中单个元素包含多个由逗号分隔的字符串。我从一个数据帧中提取了这个列表,它看起来像这样:

 [1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth"                                                                              
 [2] "Ferncroft, Passaconaway, Paugus Mill"                                                                                                   
 [3] "Alexandria, South Alexandria"                                                                                                           
 [4] "Allenstown, Blodgett, Kenison Corner, Suncook (part)"                                                                                   
 [5] "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow"                                                                 
 [6] "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands"
 [7] "Amherst, Baboosic Lake, Cricket Corner, Ponemah"                                                                                        
 [8] "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover"                                                        
 [9] "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch"                                                                    
[10] "Ashland" 
我想获得一个新的字符向量,其中每个字符串都是该字符向量中的一个元素,即:

 [1] "Acworth", "Crescent Lake", "East Acworth", "Lynn", "South Acworth"                                                                              
 [6] "Ferncroft", "Passaconaway", "Paugus Mill", "Alexandria", "South Alexandria"
我使用了
strsplit()。当我尝试将其转换为字符向量时,它会恢复到旧状态


我相信这是一个非常简单的问题-任何帮助都将不胜感激!谢谢

你的帖子标题表明你想要独特的字符串,所以

unique(unlist(strsplit(myvec, split=",")))


如果逗号后面总是有空格。

可以去掉空格,用
“\\s*,\\s*”
正则表达式拆分字符向量,然后
取消列出结果:

v <- c("Acworth, Crescent Lake, East Acworth, Lynn, South Acworth", "Ferncroft, Passaconaway, Paugus Mill", "Alexandria, South Alexandria",  "Allenstown, Blodgett, Kenison Corner, Suncook (part)", "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow", "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands", "Amherst, Baboosic Lake, Cricket Corner, Ponemah",  "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover",  "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch",  "Ashland" )
s <- unlist(strsplit(v, "\\s*,\\s*"))

v也可以使用
scan
,如下所示:

unique(scan(what = "", text = v, sep = ",", strip.white = TRUE))
strip.white=TRUE
部分负责处理您可能有的任何前导或尾随空格


注意:“v”来自。

在运行
strsplit
后,不要以.character的形式运行
unlist
,而是运行
strsplit
多么简单!太棒了,非常感谢!看见顺便说一句,有空格-你想去掉它们吗?@WiktorStribiżew
,“
而不是
”,“
@DavidArenburg:我在上面的链接上更新了演示。
,“
而不是
”,“
并且你可以添加
fixed=TRUE
,实际上,我没有看到只保留唯一字符串的要求,但是是的,“这是可以做到的。”戴维达伦堡,我想我通常不会自动假设。稍后我会执行sub()调用来清理前导空白。@WiktorStribiżew,我是从文章标题中的“unique element”中获取的,因此如果列表中有两个“Lake Crescent”,则在输出中只需要一个。如果效果不理想,请删除unique()。@DavidArenburg,fixed=TRUE?这是一个stackoverflow“东西”不同于OP标记问题的答案吗?
unique(scan(what = "", text = v, sep = ",", strip.white = TRUE))