R中的子串提取
我有一个字符串,看起来像这样:R中的子串提取,r,string,R,String,我有一个字符串,看起来像这样: {"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I've collected 72,455 gold coins! http:\/\/t.co\/eTEbfxpAr0 #iphone"} 我希望结果是: "Tue May 12 09:45:33 +0000 2015" 5980614390
{"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I've collected 72,455 gold coins! http:\/\/t.co\/eTEbfxpAr0 #iphone"}
我希望结果是:
"Tue May 12 09:45:33 +0000 2015"
598061439090196480
"598061439090196480"
"I've collected 72,455 gold coins! http:\/\/t.co\/eTEbfxpAr0 #iphone"
分隔符可以工作,但它会为某些字符串打断一行并开始一个新行。
请建议一些函数,我可以给一个子字符串的开始和结束模式,或者一个不同的方法将非常有用。谢谢。既然您有JSON格式的东西,请使用一个JSON解析器 例如:
string <- '{"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I\'ve collected 72,455 gold coins! http://example.com/eTEbfxpAr0 #iphone"}'
library(jsonlite)
fromJSON(string)
# $created_at
# [1] "Tue May 12 09:45:33 +0000 2015"
#
# $id
# [1] 5.980614e+17
#
# $id_str
# [1] "598061439090196480"
#
# $text
# [1] "I've collected 72,455 gold coins! http://example.com/eTEbfxpAr0 #iphone"
string您还可以使用regmatches
函数。最好使用Ananda的,因为使用专门为解析json文件而创建的解析器是最好的选择
> string <- '{"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I\'ve collected 72,455 gold coins! http://t.co/eTEbfxpAr0 #iphone"}'
> regmatches(string, gregexpr("(?<=:)(?:\"[^\"]*\"|[^,}]*)", string, perl=T))[[1]]
[1] "\"Tue May 12 09:45:33 +0000 2015\""
[2] "598061439090196480"
[3] "\"598061439090196480\""
[4] "\"I've collected 72,455 gold coins! http://t.co/eTEbfxpAr0 #iphone\""
>字符串