Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 提取字符串中的字符序列_R_Regex - Fatal编程技术网

R 提取字符串中的字符序列

R 提取字符串中的字符序列,r,regex,R,Regex,我想从此字符串中提取所有非大写字母的文本: a <- "NAME Agricola, Johannes ALTERNATIVNAMEN Schneider, Johann; Schnitter, Johannes; Eisleben, Johannes; Agricola Eisleben, Johannes; Bauer, Hans KURZBESCHREIBUNG deutscher Reformator GEBURTSDATUM 20. April 1494 GEBURTSORT E

我想从此字符串中提取所有非大写字母的文本:

a <- "NAME
Agricola, Johannes
ALTERNATIVNAMEN
Schneider, Johann; Schnitter, Johannes; Eisleben, Johannes; Agricola Eisleben, Johannes; Bauer, Hans
KURZBESCHREIBUNG
deutscher Reformator
GEBURTSDATUM
20. April 1494
GEBURTSORT
Eisleben
STERBEDATUM
22. September 1566
STERBEORT
Berlin"

a您可以使用
strsplit
并按两个或多个大写字母的任意顺序拆分

strsplit(a,"[A-Z]{2,}")

[[1]]
[1] ""                                                                                                        
[2] "\nAgricola, Johannes\n"                                                                                  
[3] "\nSchneider, Johann; Schnitter, Johannes; Eisleben, Johannes; Agricola Eisleben, Johannes; Bauer, Hans\n"
[4] "\ndeutscher Reformator\n"                                                                                
[5] "\n20. April 1494\n"                                                                                      
[6] "\nEisleben\n"                                                                                            
[7] "\n22. September 1566\n"                                                                                  
[8] "\nBerlin"

然后你可以选择你想要的专栏

太好了!但是如何将每个子字符串存储到变量中?在这种情况下,我怎样才能去掉“\n”?
gsub(“\n,”,a)
处理后面的问题类似于
as.data.frame(gsub(“\\n,”,do.call)(rbind,strsplit(a,“[a-Z]{2,}”))
将把
a
s的向量转换为所需变量的数据帧。完美-这正是我所寻找的!尝试
read.table(text=gsub(([:upper:])\\n“,“\\1:”,a),sep=“:”
太好了,这似乎是另一个选项!但是为什么read.table函数的分隔符是冒号呢?gsub函数用冒号替换每个大写单词末尾的换行符。这就成了你的分水岭……现在我完全明白你的答案了!
strsplit(a,"[A-Z]{2,}")

[[1]]
[1] ""                                                                                                        
[2] "\nAgricola, Johannes\n"                                                                                  
[3] "\nSchneider, Johann; Schnitter, Johannes; Eisleben, Johannes; Agricola Eisleben, Johannes; Bauer, Hans\n"
[4] "\ndeutscher Reformator\n"                                                                                
[5] "\n20. April 1494\n"                                                                                      
[6] "\nEisleben\n"                                                                                            
[7] "\n22. September 1566\n"                                                                                  
[8] "\nBerlin"
read.table(text=gsub("([[:upper:]])\\n","\\1:",a),sep=":")
                V1                                                                                                   V2
1             NAME                                                                                   Agricola, Johannes
2  ALTERNATIVNAMEN Schneider, Johann; Schnitter, Johannes; Eisleben, Johannes; Agricola Eisleben, Johannes; Bauer, Hans
3 KURZBESCHREIBUNG                                                                                 deutscher Reformator
4     GEBURTSDATUM                                                                                       20. April 1494
5       GEBURTSORT                                                                                             Eisleben
6      STERBEDATUM                                                                                   22. September 1566
7        STERBEORT                                                                                               Berlin