Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/wpf/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 从多行中提取标题_R_Extraction - Fatal编程技术网

R 从多行中提取标题

R 从多行中提取标题,r,extraction,R,Extraction,我有多个文件,每个文件都有不同的标题,我想从每个文件中提取标题名称。下面是一个文件的示例 [1] "<START" "ID=\"CMP-001\"" "NO=\"1\">" [4] "<NAME>Plasma-derived" "vaccine" "(PDV)"

我有多个文件,每个文件都有不同的标题,我想从每个文件中提取标题名称。下面是一个文件的示例

[1] "<START"                        "ID=\"CMP-001\""                  "NO=\"1\">"                         
[4] "<NAME>Plasma-derived"          "vaccine"                         "(PDV)"                             
[7] "versus"                        "placebo"                         "by"                                
[10] "intramuscular"                "route</NAME>"                    "<DIC"                     
[13] "CHI2=\"3.6385\""              "CI_END=\"0.6042\""               "CI_START=\"0.3425\""   
[16] "CI_STUDY=\"95\""                "CI_TOTAL=\"95\""               "DF=\"3.0\""                        
[19] "TOTAL_1=\"0.6648\""           "TOTAL_2=\"0.50487622\""           "BLE=\"YES\"" 
.
.
.
 [789] "TOTAL_2=\"39\""             "WEIGHT=\"300.0\""              "Z=\"1.5443\">"    
 [792] "<NAME>Local"                "adverse"                       "events" 
 [795] "after"                      "each"                          "injection"
 [798] "of"                         "vaccine</NAME>"               "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>"
 [801] "</GROUP_LABEL_2>"           "<GRAPH_LABEL_1>"              "PDV</GRAPH_LABEL_1>"

注意,每个文件都有不同的标题长度。

这里有一个使用
stringr
的解决方案。这首先将向量压缩为一个长字符串,然后捕获每对
之间不是换行符的所有单词/字符。将来,如果你做了一个简单的选择(例如,使用
dput()
),人们将能够更容易地帮助你。希望这有帮助

注意:如果第一个标题只有一个,则可以使用
str\u match()
而不是
str\u match\u all()

库(stringr)
str_match_all(paste0(string,collapse=”“),“(.*)”[[1]][2]
[1] “血浆衍生疫苗(PDV)与肌肉注射安慰剂的比较”
[2] “每次注射疫苗后的局部不良事件”
数据

string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
            "TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")

string这里有一个使用
stringr
的解决方案。这首先将向量压缩为一个长字符串,然后捕获每对
之间不是换行符的所有单词/字符。将来,如果你做了一个简单的选择(例如,使用
dput()
),人们将能够更容易地帮助你。希望这有帮助

注意:如果第一个标题只有一个,则可以使用
str\u match()
而不是
str\u match\u all()

库(stringr)
str_match_all(paste0(string,collapse=”“),“(.*)”[[1]][2]
[1] “血浆衍生疫苗(PDV)与肌肉注射安慰剂的比较”
[2] “每次注射疫苗后的局部不良事件”
数据

string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
            "TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")
string提出一个人们可以帮助解决的R问题。这包括一个可行的数据样本、所有必要的代码,以及对您正在尝试做什么和没有做什么的清晰解释。如果您使用的是表示XML的文本,那么最安全的方法可能是使用专门为此目的而设计的函数,而不是使用正则表达式来提出人们可以帮助解决的R问题。这包括一个可行的数据样本、所有必要的代码,以及对您正在尝试做什么和没有做什么的清晰解释。如果您使用的是表示XML的文本,最安全的方法可能是使用专门为此目的而设计的函数,而不是正则表达式
string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
            "TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")