R:选择具有相同常规模式的字符串

R:选择具有相同常规模式的字符串,r,pattern-matching,stringr,R,Pattern Matching,Stringr,我有一个字符串列表,如下所示: > with(providers, head(Provider.Name, 30)) [1] 1st Care (UK) Limited [2] 1st Care Limited [3] 229 Mitcham Lane Limited [4] 24-7 Care Ltd [5] 3 Dimen

我有一个
字符串列表,如下所示:

> with(providers, head(Provider.Name, 30))
 [1] 1st Care (UK) Limited                 
 [2] 1st Care Limited                      
 [3] 229 Mitcham Lane Limited              
 [4] 24-7 Care Ltd                         
 [5] 3 Dimensions Care Limited             
 [6] 3 Trees Community Support Limited     
 [7] 365 Care Homes Limited                
 [8] 3A Care (Solihull) Limited            
 [9] 3L Care Limited                       
[10] 5 Star TLC Limited                    
[11] 92 Higher Drive Limited               
[12] A & I Care Home Ltd                   
[13] A & L Care Homes Limited              
[14] A & N Kachra                          
[15] A & R Care Limited                    
[16] A Better Carehome Ltd                 
[17] A.G.E. Nursing Homes Limited          
[18] A.R.M. Healthcare Limited             
[19] AAA Elderly Care Limited              
[20] AAA Medics Ltd                        
[21] Aadams Residential Care Home Limited  
[22] Abacus Quality Care Ltd               
[23] Abberdale Limited                     
[24] Abbeville RCH Limited                 
[25] Abbey Care Centre Limited             
[26] Abbey Care Direct Ltd                 
[27] Abbey Care Home Limited               
[28] Abbey Healthcare (Aaron Court) Limited
[29] Abbey Healthcare (Kendal) Limited     
[30] Abbey Healthcare (Knebworth) Ltd  
我的目标是识别那些遵循类似模式的观察结果,然后用这种模式对它们进行相应的重命名。理想情况下,输出应类似于以下内容(请特别注意将观察值
1
2
25
更改为
30

我的问题是如何编写类似“一般模式”的东西,从而能够提取那些有效地具有相同模式的观察结果。我尝试了
stru extract
,但我认为在编写一般模式时遗漏了一些东西

library(stringr)
home = "[a-zA-Z]{2,}" # Select general pattern that where the first 2 words are similar
test = with(providers, str_extract(Provider.Name, home))

有人知道R中是否有一个函数能够识别一般的模式吗?非常感谢

正则表达式可能是为您完成繁重工作的工具,但您需要定义要匹配的特定模式及其替换。如果没有这一点,你就需要一个人工智能解决方案,它可以以某种方式找出如何替换某些东西;这超出了SO问题的范围。谢谢@Tim Biegeleisen。如果你知道你想要的确切模式,那么有人(例如@akrun)会很乐意给你答案。谢谢@Tim Biegeleisen。模式应基于前两组字符(例如分别为
1st-Care
Abbey-Care
A&
Abbey-Healthcare
1:2
12:15
25:27
28:30
。我认为模式(
home
)应该包括类似于
strsplit(Provider.Name,””)的内容
可以执行查询的解决方案可以使用
gsub()
providers$group=with(providers,gsub('([A-z]+)([A-z]+)([A-z]+)*,'\\1\\2',Provider.Name))
,从而提取前两组字符。
library(stringr)
home = "[a-zA-Z]{2,}" # Select general pattern that where the first 2 words are similar
test = with(providers, str_extract(Provider.Name, home))