R或Python中多个json文件的数据帧
我有一个mydata.txt/.json文件,其中包含如下数据:R或Python中多个json文件的数据帧,python,json,r,Python,Json,R,我有一个mydata.txt/.json文件,其中包含如下数据: [{"num":1,"name":"Swab Summer: Transformation At the United States Coast Guard Academy","link":"http:\/\/www.amazon.com\/dp\/0982168594\/ref=wl_it_dp_v_nS_ttl\/176-1400914-4673658?_encoding=UTF8&colid=1GM97SGAP8NL
[{"num":1,"name":"Swab Summer: Transformation At the United States Coast Guard Academy","link":"http:\/\/www.amazon.com\/dp\/0982168594\/ref=wl_it_dp_v_nS_ttl\/176-1400914-4673658?_encoding=UTF8&colid=1GM97SGAP8NLI&coliid=I1ELS7DSQ6QV5C","old-price":"N\/A","new-price":"","date-added":"January 10, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/51MtOOm493L._SL500_SL135_.jpg","page":1}]
[{"num":1,"name":"Vibomex","link":"http:\/\/www.amazon.com\/dp\/B00BR1CUFY\/ref=wl_it_dp_v_S_ttl\/175-5687209-2417046?_encoding=UTF8&colid=C0XVZ38E5WD9&coliid=I1EPDGRY73N5Q2","old-price":"N\/A","new-price":"","date-added":"July 20, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/31GBqOHskyL._SL500_SL135_.jpg","page":1}]
基本上是多个json文件。这是两行分开的。现在,当我尝试导入R中的数据并使其成为数据帧时,它只读取对应于第一行的行。下面是我的代码:
library(rjson)
json_file <- fromJSON(file="mydata.txt")
json_file <- lapply(json_file, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
})
do.call("rbind", json_file)
库(rjson)
json_file这里有一种方法,使用jsonlite
包中提供的fromJSON
:
do.call(rbind, lapply(readLines('mydata.json'), jsonlite::fromJSON))
# num name link
# 1 1 Swab Summer: Transformation At the United States Coast Guard Academy http://www.amazon.com/dp/0982168594/ref=wl_it_dp_v_nS_ttl/176-1400914-4673658?_encoding=UTF8&colid=1GM97SGAP8NLI&coliid=I1ELS7DSQ6QV5C
# 2 1 Vibomex http://www.amazon.com/dp/B00BR1CUFY/ref=wl_it_dp_v_S_ttl/175-5687209-2417046?_encoding=UTF8&colid=C0XVZ38E5WD9&coliid=I1EPDGRY73N5Q2
# old-price new-price date-added priority rating total-ratings comment picture page
# 1 N/A January 10, 2014 N/A http://ecx.images-amazon.com/images/I/51MtOOm493L._SL500_SL135_.jpg 1
# 2 N/A July 20, 2014 N/A http://ecx.images-amazon.com/images/I/31GBqOHskyL._SL500_SL135_.jpg 1
如果列名称集在json文件中有所不同,则可以使用:
library(dplyr)
rbind_all(lapply(readLines('mydata.json'), jsonlite::fromJSON))
谢谢你的帮助,但我得到了这个错误:parseJSON(txt)中的错误:parseerror:premature-EOF。你知道这种错误的原因吗?干杯@warwick12可能您在其中一行中遗漏了一个结束语]
。您可以执行以下操作:which(lappy(readLines('mydata.json')、函数(x)tryCatch({jsonlite::fromJSON(x);1},error=function(e)0))==0)
查看哪些行会抛出错误,然后检查.json文件中的这些行。谢谢你的帮助!现在效果很好。干杯