如何解析R中的嵌套键值对
我试图解析一个日志文件,其中包含键值对形式的结构如何解析R中的嵌套键值对,r,logging,R,Logging,我试图解析一个日志文件,其中包含键值对形式的结构 log <- c("name:praveen,age:23,place:UP,address:,dob:, site: {site_name:something , site_url: http://something.com, description:}") 请帮我解析这个文件谢谢 我已经更新了实际的日志文件格式 { "username": "lavita", "host": "10.105.22.32",
log <- c("name:praveen,age:23,place:UP,address:,dob:, site: {site_name:something , site_url: http://something.com, description:}")
请帮我解析这个文件谢谢
我已经更新了实际的日志文件格式
{
"username": "lavita",
"host": "10.105.22.32",
"event_source": "server",
"event_type": "/courses/IITB/CS101/2014_T1/xblock/i4x:;_;_IITB;_CS101;_video;_d333fa637a074b41996dc2fd5e675818/handler/xmodule_handler/save_user_state",
"context": {
"course_id": "IITB/CS101/2014_T1",
"course_user_tags": {},
"user_id": 42,
"org_id": "IITB"
},
"time": "2014-06-20T05:49:10.468638+00:00",
"ip": "127.0.0.1",
"event": "{\"POST\": {\"saved_video_position\": [\"00:02:10\"]}, \"GET\": {}}",
"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0",
"page": null
}
{
"username": "raeha",
"host": "10.105.22.32",
"event_source": "server",
"event_type": "problem_check",
"context": {
"course_id": "IITB/CS101/2014_T1",
"course_user_tags": {},
"user_id": 40,
"org_id": "IITB",
"module": {
"display_name": ""
}
},
"time": "2014-06-20T06:43:52.716455+00:00",
"ip": "127.0.0.1",
"event": {
"submission": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"input_type": "choicegroup",
"question": "",
"response_type": "multiplechoiceresponse",
"answer": "MenuInflater.inflate()",
"variant": "",
"correct": true
}
},
"success": "correct",
"grade": 1,
"correct_map": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"hint": "",
"hintmode": null,
"correctness": "correct",
"npoints": null,
"msg": "",
"queuestate": null
}
},
"state": {
"student_answers": {},
"seed": 1,
"done": null,
"correct_map": {},
"input_state": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {}
}
},
"answers": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": "choice_0"
},
"attempts": 1,
"max_grade": 1,
"problem_id": "i4x://IITB/CS101/problem/33e4aac93dc84f368c93b1d08fa984fc"
},
"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0",
"page": "x_module"
}
{
"username": "tushars",
"host": "localhost",
"event_source": "server",
"event_type": "/courses/IITB/CS101/2014_T1/instructor_dashboard/api/list_instructor_tasks",
"context": {
"course_id": "IITB/CS101/2014_T1",
"course_user_tags": {},
"user_id": 6,
"org_id": "IITB"
},
"time": "2014-06-20T05:49:26.780244+00:00",
"ip": "127.0.0.1",
"event": "{\"POST\": {}, \"GET\": {}}",
"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0",
"page": null
}
这是一个相当丑陋的格式。True
json
会引用字符串和非空值,因此它不是真正的标准格式。这里有一个同样丑陋的方法,但它可以处理多个嵌套元素
我将使用它作为测试用例
log <- paste0("name:{first:praveen,last:smith},age:23,place:UP,address:,",
"dob:, site: {site_name:something , site_url: http://something.com, ",
"description:{english:woot,spanish:wooto}}")
如果您的数据是JSON格式,请查看
rjson
包。我尝试使用rjson,但不知道如何处理嵌套的键值对。我会小心使用名称(名称)
。您可能需要重命名名称
向量。此外,您还可以将split_by_逗号/冒号
与strsplit(log,,|:”
组合在一起,感谢Flick先生的回复。我会尝试一下,如果有任何问题,我会返回给你。我已经添加了实际的文件格式。请检查它。如果你有时间。它稍微不那么难看:)我看不出有任何添加。您无法检查它有什么原因吗?我把你贴的样品放在一起。我希望这能代表你的实际数据。请再检查一遍,我没有节省时间。好吧,那不酷。我花了很多时间制作了一个解析器来处理类似json但不是真正的格式,这里有一个真正的json格式。您只需要在空白处分割数据并解析每个JSON块。你的样本数据很容易引起误解。
log <- paste0("name:{first:praveen,last:smith},age:23,place:UP,address:,",
"dob:, site: {site_name:something , site_url: http://something.com, ",
"description:{english:woot,spanish:wooto}}")
parseString<-function(log) {
nested<-c()
#find {} blocks and replace
m<-regexec("\\{[^}{]+?\\}", log)
while(sapply(m, `[`, 1)!=-1) {
s <- gsub("^\\{|\\}$","",sapply(regmatches(log,m), `[`, 1))
regmatches(log,m)<-paste0("~~", length(nested)+seq_along(s), "~~")
nested<-c(nested,s)
m<-gregexpr("\\{([^}{]+)\\}", log)
}
nested<-c(nested, log)
#turn elements into list
nestedl<-vector("list", length(nested))
for(i in seq_along(nested)) {
kv<-strsplit(nested[i], "\\s*,\\s*")[[1]]
kv<-lapply(strsplit(kv, ":"), function(x)
c(x[1], paste(x[-1],collapse=":")))
names <- gsub("\\s+","", sapply(kv, `[`,1))
vals <- gsub("\\s+","", sapply(kv, `[`,2))
valsl <- setNames(as.list(vals), names)
m <- regexec("~~(\\d+)~~", vals)
for(j in which(sapply(m, `[`, 1) != -1)) {
valsl[[j]]<-nestedl[[as.numeric(regmatches(vals[j], m[j])[[1]][2])]]
}
nestedl[[i]]<-valsl
}
nestedl[[length(nestedl)]]
}
#parseString(log)
$name
$name$first
[1] "praveen"
$name$last
[1] "smith"
$age
[1] "23"
$place
[1] "UP"
$address
[1] ""
$dob
[1] ""
$site
$site$site_name
[1] "something"
$site$site_url
[1] "http://something.com"
$site$description
$site$description$english
[1] "woot"
$site$description$spanish
[1] "wooto"