R 从多行中提取标题
我有多个文件,每个文件都有不同的标题,我想从每个文件中提取标题名称。下面是一个文件的示例R 从多行中提取标题,r,extraction,R,Extraction,我有多个文件,每个文件都有不同的标题,我想从每个文件中提取标题名称。下面是一个文件的示例 [1] "<START" "ID=\"CMP-001\"" "NO=\"1\">" [4] "<NAME>Plasma-derived" "vaccine" "(PDV)"
[1] "<START" "ID=\"CMP-001\"" "NO=\"1\">"
[4] "<NAME>Plasma-derived" "vaccine" "(PDV)"
[7] "versus" "placebo" "by"
[10] "intramuscular" "route</NAME>" "<DIC"
[13] "CHI2=\"3.6385\"" "CI_END=\"0.6042\"" "CI_START=\"0.3425\""
[16] "CI_STUDY=\"95\"" "CI_TOTAL=\"95\"" "DF=\"3.0\""
[19] "TOTAL_1=\"0.6648\"" "TOTAL_2=\"0.50487622\"" "BLE=\"YES\""
.
.
.
[789] "TOTAL_2=\"39\"" "WEIGHT=\"300.0\"" "Z=\"1.5443\">"
[792] "<NAME>Local" "adverse" "events"
[795] "after" "each" "injection"
[798] "of" "vaccine</NAME>" "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>"
[801] "</GROUP_LABEL_2>" "<GRAPH_LABEL_1>" "PDV</GRAPH_LABEL_1>"
注意,每个文件都有不同的标题长度。这里有一个使用
stringr
的解决方案。这首先将向量压缩为一个长字符串,然后捕获每对“
和”
之间不是换行符的所有单词/字符。将来,如果你做了一个简单的选择(例如,使用dput()
),人们将能够更容易地帮助你。希望这有帮助
注意:如果第一个标题只有一个,则可以使用str\u match()
而不是str\u match\u all()
库(stringr)
str_match_all(paste0(string,collapse=”“),“(.*)”[[1]][2]
[1] “血浆衍生疫苗(PDV)与肌肉注射安慰剂的比较”
[2] “每次注射疫苗后的局部不良事件”
数据:
string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
"TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")
string这里有一个使用stringr
的解决方案。这首先将向量压缩为一个长字符串,然后捕获每对“
和”
之间不是换行符的所有单词/字符。将来,如果你做了一个简单的选择(例如,使用dput()
),人们将能够更容易地帮助你。希望这有帮助
注意:如果第一个标题只有一个,则可以使用str\u match()
而不是str\u match\u all()
库(stringr)
str_match_all(paste0(string,collapse=”“),“(.*)”[[1]][2]
[1] “血浆衍生疫苗(PDV)与肌肉注射安慰剂的比较”
[2] “每次注射疫苗后的局部不良事件”
数据:
string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
"TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")
string提出一个人们可以帮助解决的R问题。这包括一个可行的数据样本、所有必要的代码,以及对您正在尝试做什么和没有做什么的清晰解释。如果您使用的是表示XML的文本,那么最安全的方法可能是使用专门为此目的而设计的函数,而不是使用正则表达式来提出人们可以帮助解决的R问题。这包括一个可行的数据样本、所有必要的代码,以及对您正在尝试做什么和没有做什么的清晰解释。如果您使用的是表示XML的文本,最安全的方法可能是使用专门为此目的而设计的函数,而不是正则表达式
string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
"TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")