在R中使用gsub清理推文

在R中使用gsub清理推文,r,R,我正在尝试使用gsub清理一堆推文 V3 1 Well: Getting Insurance to Pay for Midwives http://xxxxxxxxx 2 Lightning may be giving you a headache http://xxxxxxxx 3 New York City is requiring flu shots for kids under 5 in city preschools and day care. Do your kids get

我正在尝试使用gsub清理一堆推文

V3
1  Well: Getting Insurance to Pay for Midwives http://xxxxxxxxx
2  Lightning may be giving you a headache http://xxxxxxxx
3  New York City is requiring flu shots for kids under 5 in city preschools and day care. Do your kids get the flu shot? http://xxxxxxxx
4  VIDEO: Can we erase memories entirely? http://xxxxxxxx
5  Artificial sweeteners are a $1.5-billion-a-year market @kchangnyt reported last year. http://xxxxxxxx
我尝试使用以下代码删除所有链接(摘自上一个问题):


有人能解释一下为什么第一种情况下的代码不能产生期望的结果吗?

这是因为
\w
只识别字母数字字符。由于http后面总是跟“:/”,
\w
不承认它是合法表达式

相比之下,
*
只拾取“http”后面的任何内容,这样就可以了

newdf1$V3 <- gsub("http\\w+", "", newdf1$V3)
V3
1  Well: Getting Insurance to Pay for Midwives 
2  Lightning may be giving you a headache 
3  New York City is requiring flu shots for kids under 5 in city preschools and day care. Do your kids get the flu shot? 
4  VIDEO: Can we erase memories entirely? 
5  Artificial sweeteners are a $1.5-billion-a-year market @kchangnyt reported last year.