在R中使用gsub清理推文
我正在尝试使用gsub清理一堆推文在R中使用gsub清理推文,r,R,我正在尝试使用gsub清理一堆推文 V3 1 Well: Getting Insurance to Pay for Midwives http://xxxxxxxxx 2 Lightning may be giving you a headache http://xxxxxxxx 3 New York City is requiring flu shots for kids under 5 in city preschools and day care. Do your kids get
V3
1 Well: Getting Insurance to Pay for Midwives http://xxxxxxxxx
2 Lightning may be giving you a headache http://xxxxxxxx
3 New York City is requiring flu shots for kids under 5 in city preschools and day care. Do your kids get the flu shot? http://xxxxxxxx
4 VIDEO: Can we erase memories entirely? http://xxxxxxxx
5 Artificial sweeteners are a $1.5-billion-a-year market @kchangnyt reported last year. http://xxxxxxxx
我尝试使用以下代码删除所有链接(摘自上一个问题):
有人能解释一下为什么第一种情况下的代码不能产生期望的结果吗?这是因为
\w
只识别字母数字字符。由于http后面总是跟“:/”,\w
不承认它是合法表达式
相比之下,*
只拾取“http”后面的任何内容,这样就可以了
newdf1$V3 <- gsub("http\\w+", "", newdf1$V3)
V3
1 Well: Getting Insurance to Pay for Midwives
2 Lightning may be giving you a headache
3 New York City is requiring flu shots for kids under 5 in city preschools and day care. Do your kids get the flu shot?
4 VIDEO: Can we erase memories entirely?
5 Artificial sweeteners are a $1.5-billion-a-year market @kchangnyt reported last year.