如何在R中解析具有堆叠多个JSON的文件?
我在R中有以下“堆叠的JSON”对象,如何在R中解析具有堆叠多个JSON的文件?,r,json,jsonlite,R,Json,Jsonlite,我在R中有以下“堆叠的JSON”对象,example1.JSON: {"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes", "Code":[{"event1":"A","result":"1"},…]} {"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No", "Code":[{"event1":"B","result":"1"},…]} {"ID":"AA356","Ti
example1.JSON
:
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]}
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]}
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]}
这些不是逗号分隔的。基本目标是将某些字段(或所有字段)解析为R data.frame或data.table:
Timestamp Usefulness
0 20140101 Yes
1 20140102 No
2 20140103 No
通常,我会在R中读取JSON,如下所示:
library(jsonlite)
jsonfile = "example1.json"
foobar = fromJSON(jsonfile)
但是,这会引发一个分析错误:
Error: lexical error: invalid char in json text.
[{"event1":"A","result":"1"},…]} {"ID":"1A35B","Timestamp"
(right here) ------^
这是一个类似于以下问题的问题,但在R中:
编辑:此文件格式称为“换行符分隔的JSON”,NDJSON
…
使JSON无效,因此出现词法错误
jsonlite::stream_in()
来“stream in”JSON行资料 我已经清理了您的示例数据,使其成为有效的JSON,并将其作为
~/desktop/examples1.JSON
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes","Code":[{"event1":"A","result":"1"}]}
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No","Code":[{"event1":"B","result":"1"}]}
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No","Code":[{"event1":"B","result":"0"}]}
在
“code”
之前真的有新行吗?还是为了可读性?我还假设…
是您,而不是JSON。如果它们是每行有一个压缩JSON记录的文件,那么它们就是“ndjson”文件,您可以使用ndjson::stream_in()
,它比jsonlite
对应的文件快,并且总是生成一个“平面”数据帧。如果是这样,这是一个dup,我们需要知道这一点,以便可以将其标记为平面数据帧。@hrbrmstr Yes,请标记为重复问题。
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes","Code":[{"event1":"A","result":"1"}]}
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No","Code":[{"event1":"B","result":"1"}]}
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No","Code":[{"event1":"B","result":"0"}]}