R 从字符向量中删除不是特定单词的所有单词

R 从字符向量中删除不是特定单词的所有单词,r,character,text-mining,R,Character,Text Mining,我有一个像这样的角色列表 [70] "CSF 5896-6133" [71] "CRT 16" [72] "SEEF 54-55"

我有一个像这样的角色列表

[70] "CSF  5896-6133"                                                           
[71] "CRT  16"                                                                  
[72] "SEEF  54-55"                                                              
[73] "CIF  190-195"                                                             
[74] "DE & /ON CIF  196-222"                                                    
[75] " CRT  17 "                                                                
[76] " SEEF  56-57"                                                             
[77] "DE & /ON CSF  6134-6725 "                                                 
[78] " SEEF  58-60"                                                             
[79] "CRT 18"                                                                   
[80] " CSF 6726-6837"                                                           
[81] "SEEF 61"                                                                  
[82] " CSF 6840-6926"                                                           
[83] " CIF 223-226"                                                             
[84] "SEEF 62-63"                                                               
[85] " CSF 6927-7065"                                                           
[86] " CIF 226-228"                                                             
[87] "CSF 7066-7185"                                                            
[88] "CSF 7186-7311"                                                            
[89] " CIF 229"                                                                 
[90] " SEEF 66"                                                                 
[91] "CSF 7312-7561"                                                            
[92] " CRT 19"                                                                  
[93] " SEEF 67-68"                                                              
[94] "Final data QAQC done on CSF  1-7561"                                      
[95] " CIF  1-229"                                                              
[96] " SEEF  1-68 "                                                             
[97] " CRT  1-19"                                                               
[98] "082015-HOBA-G17-1 changed to offPlot based on GIS review of searched     area"
正如你所看到的,这只是其中的一部分

我想删除所有不是数字或数字的单词

CSF, CIF, SEEF, CRT
例如,94-98中的部分

[94] "CSF  1-7561"                                      
[95] " CIF  1-229"                                                              
[96] " SEEF  1-68 "                                                             
[97] " CRT  1-19"                                                               

正如您所看到的,第98行将被完全删除,因为它没有我想要的关键字。第94行也删除了一些单词

考虑以下向量:

v <- c("Final data QAQC done on CSF  1-7561", 
       "CIF  1-229", 
       "SEEF  1-68", 
       "CRT  1-19",
       "082015-HOBA-G17-1 changed to offPlot based on GIS review of searched     area")
其中:

#[[1]]
#[1] "CSF"    "1-7561"
#
#[[2]]
#[1] "CIF"   "1-229"
#
#[[3]]
#[1] "SEEF" "1-68"
#
#[[4]]
#[1] "CRT"  "1-19"
#
#[[5]]
#[1] NA
根据@akrun提到的,您还可以:

regmatches(v, gregexpr(pattern, v))
其中:

#[[1]]
#[1] "CSF"    "1-7561"
#
#[[2]]
#[1] "CIF"   "1-229"
#
#[[3]]
#[1] "SEEF" "1-68"
#
#[[4]]
#[1] "CRT"  "1-19"
#
#[[5]]
#character(0)
使用stringr:

我会使用stringr库

这是您的数据的一个子集

x <- c("CSF  5896-6133",                                                           
"CRT  16",                                                                  
"SEEF  54-55",                                                              
"CIF  190-195",
"Final data QAQC done on CSF  1-7561",
"082015-HOBA-G17-1 changed to offPlot based on GIS review of searched     area"
)

如果没有与模式匹配的内容,它将返回一个缺少的值。

请查看我5分钟前发布的答案;我承认这很相似。试试一个稍微不同的正则表达式。这和@Psidom不一样吗?是的,非常相似!我只是在他回复之前贴了一点,实际上他在16:54:18贴,你在16:54:44贴;无论如何,它也是一个稍微不同的正则表达式,所以OP可以尝试所有的解决方案。干杯基本R选项是regmatchesv、gregexprpattern、v。加一
#[[1]]
#[1] "CSF"    "1-7561"
#
#[[2]]
#[1] "CIF"   "1-229"
#
#[[3]]
#[1] "SEEF" "1-68"
#
#[[4]]
#[1] "CRT"  "1-19"
#
#[[5]]
#character(0)
library(stringr)
testString <- c("Final data QAQC done on CSF  1-7561" ,
                " CIF  1-229" ,
                " SEEF  1-68 ",
                " CRT  1-19",
                "082015-HOBA-G17-1 changed to offPlot based on GIS review of searched     area" )

str_extract(testString, "(CSF|CIF|SEEF|CRT)\\s+\\d+-\\d+")
[1] "CSF  1-7561" "CIF  1-229"  "SEEF  1-68"  "CRT  1-19"   NA 
x <- c("CSF  5896-6133",                                                           
"CRT  16",                                                                  
"SEEF  54-55",                                                              
"CIF  190-195",
"Final data QAQC done on CSF  1-7561",
"082015-HOBA-G17-1 changed to offPlot based on GIS review of searched     area"
)
library(stringr)

> str_extract(x, '(CSF|CIF|SEEF|CRT)[:space:]+([0-9]|-)+')
[1] "CSF  5896-6133" "CRT  16"        "SEEF  54-55"    "CIF  190-195"   "CSF  1-7561"   
[6] NA