如何删除R中以冒号结尾的文本模式?
我有下面的句子如何删除R中以冒号结尾的文本模式?,r,regex,gsub,R,Regex,Gsub,我有下面的句子 review <- C("1a. How long did it take for you to receive a personalized response to an internet or email inquiry made to THIS dealership?: Approx. It was very prompt however. 2f. Consideration of your time and responsiveness to your reques
review <- C("1a. How long did it take for you to receive a personalized response to an internet or email inquiry made to THIS dealership?: Approx. It was very prompt however. 2f. Consideration of your time and responsiveness to your requests.: Were a little bit pushy but excellent otherwise 2g. Your satisfaction with the process of coming to an agreement on pricing.: Were willing to try to bring the price to a level that was acceptable to me. Please provide any additional comments regarding your recent sales experience.: Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! ")
然而,它只删除了以冒号结尾的第一句话
预期成果:
Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me. Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)!
如有任何帮助或建议,将不胜感激。谢谢。如果句子不复杂且没有缩写,您可以使用
gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
看
请注意,您可以通过将\\d+[a-zA-Z]
更改为[0-9a-zA-Z]+
/[[:alnum:][]+
以匹配1+个数字或字母来进一步概括它
详细信息
-可选的(?:\d+[a-zA-Z]\)?
-1+位\d+
-一个ASCII字母[a-zA-Z]
-一个点\。
-0个或更多字符,而不是[^.?!:]*
,
,?
代码>,
:
-a[?!.]
,?
代码>或
-冒号:
-0+空格\s*
> gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
[1] "Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me.Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! "
扩展以处理缩写
如果添加替换项,则可以枚举例外:
gsub("(?:\\d+[a-zA-Z]\\.)?(?:i\\.?e\\.|[^.?!:])*[?!.]:\\s*", "", review)
^^^^^^^^^^^^^^^^^^^^^^
这里,(?:i\.?e\.[^.?!:])*
匹配0个或多个即或即子字符串或除、、以外的任何字符代码>或:
请参阅。您的问题不清楚。之前的所有内容都可以包含所有字符。这是一个句子吗?所以你只想删除1a.
,2f.
,2g.
,:
?每行上的字符都一样吗?很抱歉弄糊涂了,基本上,我的意思是我想去掉句子中的所有问题,只保留答案。在我的例子中,问题以冒号结尾,这就是为什么我在colonTrygsub((?:\\d+[a-zA-Z]\\)?[^.?!:]*[?!]:\\s*“,”,review)之前提到了所有内容。
如果你能给我解释一下regex,那就太好了。对于“4c.请在返回你的车上对你的状况进行评分(即清洁度,完好无损):非常感谢您的清洗!”,正则表达式不会返回预期结果。我该怎么做?@gamyanaidu我在一开始就补充说:如果没有缩写。如果有,您可以手动添加它们,如(?:\d+[a-zA-Z]\)(?:i\.?e\.[^.?!:])*[?!]:\s*
,请参阅。完美答案。非常感谢你。
gsub("(?:\\d+[a-zA-Z]\\.)?(?:i\\.?e\\.|[^.?!:])*[?!.]:\\s*", "", review)
^^^^^^^^^^^^^^^^^^^^^^