Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用R分隔twitter状态/超链接/日期_R_Twitter - Fatal编程技术网

用R分隔twitter状态/超链接/日期

用R分隔twitter状态/超链接/日期,r,twitter,R,Twitter,我想自动分离以下推文,以获得推文本身,超链接和日期在三个单独的列。有人能帮忙吗?我的数据集的名称是DB_YS,它是一个txt文件 以下是我的数据框中的一些推文: Thank you, everyone! indyref http://t.co/1kTzqjyGE7 Sep 18, 2014 As the polls close, total likes on the @YesScotland Facebook page have passed David Cameron s one.

我想自动分离以下推文,以获得推文本身,超链接和日期在三个单独的列。有人能帮忙吗?我的数据集的名称是DB_YS,它是一个txt文件

以下是我的数据框中的一些推文:

Thank you, everyone!  indyref http://t.co/1kTzqjyGE7 Sep 18, 2014 
  As the polls close, total likes on the @YesScotland Facebook page have passed David Cameron s one.  indyref  voteYes http://t.co/x7IoB1EtfY Sep 18, 2014 
We can be proud of  indyref, which has seen a flourishing of Scotland’s self-confidence as a nation  VoteYes http://t.co/1OqxvbpoS9 Sep 18, 2014 
We can afford world-class public services. A Yes vote means we can strengthen our NHS.  VoteYes  indyref http://t.co/D9Vn5OqStV Sep 18, 2014 
This is a once in a lifetime opportunity to choose a new and better path for Scotland  VoteYes  indyref http://t.co/9knT6Mx4vZ Sep 18, 2014 
Our young people shouldn t have to leave to find decent jobs.  VoteYes  indyref http://t.co/vAE164f0Oy Sep 18, 2014 

下面是一个使用一系列正则表达式的基本包解决方案:

# Assume df is your data frame with a column called txt

# Match text until the beginning of the URL
tweet.regex <- regexpr("^.*(?=http)", df$txt, perl=T)

# Extract tweet text
tweet <- substr(df$txt, tweet.regex, attr(tweet.regex, "match.length"))

# Match text from the beginning of the URL to the next space
url.regex <- regexpr("http[^ ]+(?= )", df$txt, perl=T)

# Extract URL
url <- substr(df$txt, url.regex, url.regex + attr(url.regex, "match.length"))

# Match the date
date.regex <- regexpr("[A-Za-z]+ \\d+, \\d{4} *$", df$txt, perl=T)

# Extract date
date <- substr(df$txt, date.regex, date.regex + attr(date.regex, "match.length"))

# Combine results
tweet.df <- data.frame(tweet, url, date, stringsAsFactors=F)
#假设df是您的数据帧,包含一个名为txt的列
#匹配文本,直到URL开头

tweet.regex下面是一个使用一系列正则表达式的基本包解决方案:

# Assume df is your data frame with a column called txt

# Match text until the beginning of the URL
tweet.regex <- regexpr("^.*(?=http)", df$txt, perl=T)

# Extract tweet text
tweet <- substr(df$txt, tweet.regex, attr(tweet.regex, "match.length"))

# Match text from the beginning of the URL to the next space
url.regex <- regexpr("http[^ ]+(?= )", df$txt, perl=T)

# Extract URL
url <- substr(df$txt, url.regex, url.regex + attr(url.regex, "match.length"))

# Match the date
date.regex <- regexpr("[A-Za-z]+ \\d+, \\d{4} *$", df$txt, perl=T)

# Extract date
date <- substr(df$txt, date.regex, date.regex + attr(date.regex, "match.length"))

# Combine results
tweet.df <- data.frame(tweet, url, date, stringsAsFactors=F)
#假设df是您的数据帧,包含一个名为txt的列
#匹配文本,直到URL开头

tweet.regex下面是一个使用一系列正则表达式的基本包解决方案:

# Assume df is your data frame with a column called txt

# Match text until the beginning of the URL
tweet.regex <- regexpr("^.*(?=http)", df$txt, perl=T)

# Extract tweet text
tweet <- substr(df$txt, tweet.regex, attr(tweet.regex, "match.length"))

# Match text from the beginning of the URL to the next space
url.regex <- regexpr("http[^ ]+(?= )", df$txt, perl=T)

# Extract URL
url <- substr(df$txt, url.regex, url.regex + attr(url.regex, "match.length"))

# Match the date
date.regex <- regexpr("[A-Za-z]+ \\d+, \\d{4} *$", df$txt, perl=T)

# Extract date
date <- substr(df$txt, date.regex, date.regex + attr(date.regex, "match.length"))

# Combine results
tweet.df <- data.frame(tweet, url, date, stringsAsFactors=F)
#假设df是您的数据帧,包含一个名为txt的列
#匹配文本,直到URL开头

tweet.regex下面是一个使用一系列正则表达式的基本包解决方案:

# Assume df is your data frame with a column called txt

# Match text until the beginning of the URL
tweet.regex <- regexpr("^.*(?=http)", df$txt, perl=T)

# Extract tweet text
tweet <- substr(df$txt, tweet.regex, attr(tweet.regex, "match.length"))

# Match text from the beginning of the URL to the next space
url.regex <- regexpr("http[^ ]+(?= )", df$txt, perl=T)

# Extract URL
url <- substr(df$txt, url.regex, url.regex + attr(url.regex, "match.length"))

# Match the date
date.regex <- regexpr("[A-Za-z]+ \\d+, \\d{4} *$", df$txt, perl=T)

# Extract date
date <- substr(df$txt, date.regex, date.regex + attr(date.regex, "match.length"))

# Combine results
tweet.df <- data.frame(tweet, url, date, stringsAsFactors=F)
#假设df是您的数据帧,包含一个名为txt的列
#匹配文本,直到URL开头

这里有一个使用
stringr
包的解决方案

library("stringr")
dat <- c("Thank you, everyone!  indyref http://t.co/1kTzqjyGE7 Sep 18, 2014 ",
"As the polls close, total likes on the @YesScotland Facebook page have passed David Cameron s one.  indyref  voteYes http://t.co/x7IoB1EtfY Sep 18, 2014 ",
"We can be proud of  indyref, which has seen a flourishing of Scotland’s self-confidence as a nation  VoteYes http://t.co/1OqxvbpoS9 Sep 18, 2014 ",
"We can afford world-class public services. A Yes vote means we can strengthen our NHS.  VoteYes  indyref http://t.co/D9Vn5OqStV Sep 18, 2014 ",
"This is a once in a lifetime opportunity to choose a new and better path for Scotland  VoteYes  indyref http://t.co/9knT6Mx4vZ Sep 18, 2014 ",
"Our young people shouldn t have to leave to find decent jobs.  VoteYes  indyref http://t.co/vAE164f0Oy Sep 18, 2014 ")

dates <- str_extract(dat, "[A-Z]{1}[a-z]{2} [0-9]{1,2}, [0-9]{4}")
url <- str_extract(dat, "http://t.co/[0-9A-Za-z]{10}")
text <- gsub("  indyref.+", "", dat)
df <- data.frame(dates, text, url, stringsAsFactors=F)
库(“stringr”)

dat这里有一个使用
stringr
包的解决方案

library("stringr")
dat <- c("Thank you, everyone!  indyref http://t.co/1kTzqjyGE7 Sep 18, 2014 ",
"As the polls close, total likes on the @YesScotland Facebook page have passed David Cameron s one.  indyref  voteYes http://t.co/x7IoB1EtfY Sep 18, 2014 ",
"We can be proud of  indyref, which has seen a flourishing of Scotland’s self-confidence as a nation  VoteYes http://t.co/1OqxvbpoS9 Sep 18, 2014 ",
"We can afford world-class public services. A Yes vote means we can strengthen our NHS.  VoteYes  indyref http://t.co/D9Vn5OqStV Sep 18, 2014 ",
"This is a once in a lifetime opportunity to choose a new and better path for Scotland  VoteYes  indyref http://t.co/9knT6Mx4vZ Sep 18, 2014 ",
"Our young people shouldn t have to leave to find decent jobs.  VoteYes  indyref http://t.co/vAE164f0Oy Sep 18, 2014 ")

dates <- str_extract(dat, "[A-Z]{1}[a-z]{2} [0-9]{1,2}, [0-9]{4}")
url <- str_extract(dat, "http://t.co/[0-9A-Za-z]{10}")
text <- gsub("  indyref.+", "", dat)
df <- data.frame(dates, text, url, stringsAsFactors=F)
库(“stringr”)

dat这里有一个使用
stringr
包的解决方案

library("stringr")
dat <- c("Thank you, everyone!  indyref http://t.co/1kTzqjyGE7 Sep 18, 2014 ",
"As the polls close, total likes on the @YesScotland Facebook page have passed David Cameron s one.  indyref  voteYes http://t.co/x7IoB1EtfY Sep 18, 2014 ",
"We can be proud of  indyref, which has seen a flourishing of Scotland’s self-confidence as a nation  VoteYes http://t.co/1OqxvbpoS9 Sep 18, 2014 ",
"We can afford world-class public services. A Yes vote means we can strengthen our NHS.  VoteYes  indyref http://t.co/D9Vn5OqStV Sep 18, 2014 ",
"This is a once in a lifetime opportunity to choose a new and better path for Scotland  VoteYes  indyref http://t.co/9knT6Mx4vZ Sep 18, 2014 ",
"Our young people shouldn t have to leave to find decent jobs.  VoteYes  indyref http://t.co/vAE164f0Oy Sep 18, 2014 ")

dates <- str_extract(dat, "[A-Z]{1}[a-z]{2} [0-9]{1,2}, [0-9]{4}")
url <- str_extract(dat, "http://t.co/[0-9A-Za-z]{10}")
text <- gsub("  indyref.+", "", dat)
df <- data.frame(dates, text, url, stringsAsFactors=F)
库(“stringr”)

dat这里有一个使用
stringr
包的解决方案

library("stringr")
dat <- c("Thank you, everyone!  indyref http://t.co/1kTzqjyGE7 Sep 18, 2014 ",
"As the polls close, total likes on the @YesScotland Facebook page have passed David Cameron s one.  indyref  voteYes http://t.co/x7IoB1EtfY Sep 18, 2014 ",
"We can be proud of  indyref, which has seen a flourishing of Scotland’s self-confidence as a nation  VoteYes http://t.co/1OqxvbpoS9 Sep 18, 2014 ",
"We can afford world-class public services. A Yes vote means we can strengthen our NHS.  VoteYes  indyref http://t.co/D9Vn5OqStV Sep 18, 2014 ",
"This is a once in a lifetime opportunity to choose a new and better path for Scotland  VoteYes  indyref http://t.co/9knT6Mx4vZ Sep 18, 2014 ",
"Our young people shouldn t have to leave to find decent jobs.  VoteYes  indyref http://t.co/vAE164f0Oy Sep 18, 2014 ")

dates <- str_extract(dat, "[A-Z]{1}[a-z]{2} [0-9]{1,2}, [0-9]{4}")
url <- str_extract(dat, "http://t.co/[0-9A-Za-z]{10}")
text <- gsub("  indyref.+", "", dat)
df <- data.frame(dates, text, url, stringsAsFactors=F)
库(“stringr”)

dat这里是使用“stringr”包的解决方案。这是基于科里的回答,但它纠正了一些错误,如果你有非传统的推特 它假设您有一个名为DB_YS.txt的.txt文件,其中包含所有原始文本格式的推文。并且您已经安装了库“stringr”。否则,您必须安装.packages(“stringr”)

库(stringr)
#将数据加载到R中

RawData这里是使用“stringr”包的解决方案。这是基于科里的回答,但它纠正了一些错误,如果你有非传统的推特 它假设您有一个名为DB_YS.txt的.txt文件,其中包含所有原始文本格式的推文。并且您已经安装了库“stringr”。否则,您必须安装.packages(“stringr”)

库(stringr)
#将数据加载到R中

RawData这里是使用“stringr”包的解决方案。这是基于科里的回答,但它纠正了一些错误,如果你有非传统的推特 它假设您有一个名为DB_YS.txt的.txt文件,其中包含所有原始文本格式的推文。并且您已经安装了库“stringr”。否则,您必须安装.packages(“stringr”)

库(stringr)
#将数据加载到R中

RawData这里是使用“stringr”包的解决方案。这是基于科里的回答,但它纠正了一些错误,如果你有非传统的推特 它假设您有一个名为DB_YS.txt的.txt文件,其中包含所有原始文本格式的推文。并且您已经安装了库“stringr”。否则,您必须安装.packages(“stringr”)

库(stringr)
#将数据加载到R中

RawData您要获取日期的正则表达式将匹配作为推文文本一部分的日期。我建议在末尾添加
$
,以确保只匹配字符串末尾的日期。要获取日期的正则表达式将匹配作为推文文本一部分的日期。我建议在末尾添加
$
,以确保只匹配字符串末尾的日期。要获取日期的正则表达式将匹配作为推文文本一部分的日期。我建议在末尾添加
$
,以确保只匹配字符串末尾的日期。要获取日期的正则表达式将匹配作为推文文本一部分的日期。我建议在末尾添加
$
,以确保只匹配字符串末尾的日期。非常感谢!)我确实有一些情况下,日期似乎没有放在正确的栏中,比如这条推文,所有内容都放在“推文”栏中:非常祝贺@Team_Scotland的所有人,因为他们已经出色地完成了奖牌目标!还有更多的时间。。。戈斯科特兰2014年7月29日。知道怎么解决吗?非常感谢!:)我确实有一些情况下,日期似乎没有放在正确的栏中,比如这条推文,所有内容都放在“推文”栏中:非常祝贺@Team_Scotland的所有人,因为他们已经出色地完成了奖牌目标!还有更多的时间。。。戈斯科特兰2014年7月29日。知道怎么解决吗?非常感谢!:)我确实有一些情况下,日期似乎没有放在正确的栏中,比如这条推文,所有内容都放在“推文”栏中:非常祝贺@Team_Scotland的所有人,因为他们已经出色地完成了奖牌目标!还有更多的时间。。。戈斯科特兰2014年7月29日。知道怎么解决吗?非常感谢!:)我确实有一些情况下,日期似乎没有放在正确的栏中,比如这条推文,所有内容都放在“推文”栏中:非常祝贺@Team_Scotland的所有人,因为他们已经出色地完成了奖牌目标!还有更多的时间。。。戈斯科特兰2014年7月29日。你知道怎么解决吗?