解析R中的json文件时出错
包含100个实例的Yelp业务数据,格式如下:解析R中的json文件时出错,r,json,R,Json,包含100个实例的Yelp业务数据,格式如下: { "_id" : ObjectId("5aab338ffc08b46adb7a2320"), "business_id" : "Pd52CjgyEU3Rb8co6QfTPw", "name" : "Flight Deck Bar & Grill", "neighborhood" : "Southeast", "address" : "6730 S Las Vegas Blvd",
{
"_id" : ObjectId("5aab338ffc08b46adb7a2320"),
"business_id" : "Pd52CjgyEU3Rb8co6QfTPw",
"name" : "Flight Deck Bar & Grill",
"neighborhood" : "Southeast",
"address" : "6730 S Las Vegas Blvd",
"city" : "Las Vegas",
"state" : "NV",
"postal_code" : "89119",
"latitude" : 36.0669136,
"longitude" : -115.1708484,
"stars" : 4.0,
"review_count" : NumberInt(13),
"is_open" : NumberInt(1),
"attributes" : {
"Alcohol" : "full_bar",
"HasTV" : true,
"NoiseLevel" : "average",
"RestaurantsAttire" : "casual",
"BusinessAcceptsCreditCards" : true,
"Music" : {
"dj" : false,
"background_music" : true,
"no_music" : false,
"karaoke" : false,
"live" : false,
"video" : false,
"jukebox" : false
},
"Ambience" : {
"romantic" : false,
"intimate" : false,
"classy" : false,
"hipster" : false,
"divey" : false,
"touristy" : false,
"trendy" : false,
"upscale" : false,
"casual" : true
},
"RestaurantsGoodForGroups" : true,
"Caters" : true,
"WiFi" : "free",
"RestaurantsReservations" : false,
"RestaurantsTableService" : true,
"RestaurantsTakeOut" : true,
"GoodForKids" : true,
"HappyHour" : true,
"GoodForDancing" : false,
"BikeParking" : true,
"OutdoorSeating" : false,
"RestaurantsPriceRange2" : NumberInt(2),
"RestaurantsDelivery" : false,
"BestNights" : {
"monday" : false,
"tuesday" : false,
"friday" : false,
"wednesday" : true,
"thursday" : false,
"sunday" : false,
"saturday" : false
},
"GoodForMeal" : {
"dessert" : false,
"latenight" : false,
"lunch" : true,
"dinner" : false,
"breakfast" : false,
"brunch" : false
},
"BusinessParking" : {
"garage" : false,
"street" : false,
"validated" : false,
"lot" : true,
"valet" : false
},
"CoatCheck" : false,
"Smoking" : "no",
"WheelchairAccessible" : true
},
"categories" : [
"Nightlife",
"Bars",
"Barbeque",
"Sports Bars",
"American (New)",
"Restaurants"
],
"hours" : {
"Monday" : "8:30-22:30",
"Tuesday" : "8:30-22:30",
"Friday" : "8:30-22:30",
"Wednesday" : "8:30-22:30",
"Thursday" : "8:30-22:30",
"Sunday" : "8:30-22:30",
"Saturday" : "8:30-22:30"
}
}
我需要在R中导入此文件。我有以下代码:
library('jsonlite')
data<- stream_in(file("~/Desktop/business100.json"))
我认为json的格式有一些问题,但是当我在mongodb中看到json文件时,它看起来很好。我们能为它做些什么,谢谢 如果这是蒙古岩(如评论中所建议的),那么这可能是最好的方法。如果由于某种原因无法使用它,那么可以替换这些非JSON属性,并使用常规JSON解析器对其进行解析 要进行概括,请创建(逐字)字符串的向量。我假设每个属性的形式都是
DiscardableProperty(此处保存所有属性)
,因此根据您提供的数据,一个好的起点是:
ptns <- c('ObjectId', 'NumberInt')
str(jsontxt)
# chr "{ \n \"_id\" : ObjectId(\"5aab338ffc08b46adb7a2320\"), \n \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n \"name\" : "| __truncated__
jsontxt2 <- Reduce(function(txt, p) gsub(sprintf("%s\\(([^)]+)\\)", p), "\\1", txt),
ptns, init=jsontxt)
str(jsontxt2)
# chr "{ \n \"_id\" : \"5aab338ffc08b46adb7a2320\", \n \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n \"name\" : \"Flight D"| __truncated__
编辑:单程替换为:
jsontxt2 <- gsub(sprintf("(%s)\\(([^)]+)\\)", paste(ptns, collapse = "|")),
"\\2", jsontxt)
jsontxt2直接使用mongolite
查询mongodb要扩展@symbolXau的注释,您有一个mongodb扩展JSON文件,而不是正确的JSON文件。()您需要将其导入MongoDB以使用它(这是最简单的方法)或在严格模式下重新导出它。
str(fromJSON(jsontxt2))
# List of 16
# $ _id : chr "5aab338ffc08b46adb7a2320"
# $ business_id : chr "Pd52CjgyEU3Rb8co6QfTPw"
# $ name : chr "Flight Deck Bar & Grill"
# $ neighborhood: chr "Southeast"
# $ address : chr "6730 S Las Vegas Blvd"
# ...
jsontxt2 <- gsub(sprintf("(%s)\\(([^)]+)\\)", paste(ptns, collapse = "|")),
"\\2", jsontxt)