使用R从文本中提取子字符串
我有一个字符串数据,如下所示:使用R从文本中提取子字符串,r,regex,gsub,R,Regex,Gsub,我有一个字符串数据,如下所示: a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n" 这给了我如下的输出 "Social Media Learning and behaviou
a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n"
这给了我如下的输出
"Social Media Learning and behaviour\n\n"
我无法匹配确切的模式。没有“\n\n”的情况下,提取“社交媒体学习和行为”的确切模式是什么?您可以提取
“更新您的个人资料以消除此消息”
和“上传到”
您也可以从stringr
stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on")[, 2]
您可以捕获组中的前一行,并与包含以下内容的下一行匹配:
(.*)\r?\n[^\S\r\n]+Uploaded on
a您还可以匹配捕获组中的前一行,并匹配包含上载的^(.*)\r?\n上载的
stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on")[, 2]
(.*)\r?\n[^\S\r\n]+Uploaded on
a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n"
stringr::str_match(a, "(.*)\\r?\\n[^\\S\\r\\n]+Uploaded on")