无法在python中加载json文件_Python_Json

无法在python中加载json文件

python json

无法在python中加载json文件,python,json,Python,Json,我得到了一个json格式的推特流数据文件。现在我尝试用python加载它： import json tweets_data=[] tweets_file=open('test1.txt',"r") for line in tweets_file: try: tweet=json.load(line) tweets_data.append(tweet) except: continue print(len(tweets_data

我得到了一个json格式的推特流数据文件。现在我尝试用python加载它：

import json

tweets_data=[]
tweets_file=open('test1.txt',"r")
for line in tweets_file:
    try:
        tweet=json.load(line)
        tweets_data.append(tweet)
    except:
        continue 

print(len(tweets_data))

结果总是0。如果删除了“try”和“except”，则错误为“ValueError:期望值：第2行第1列（字符1）”。然而，根据在线验证器，文件的每一行都是有效的JSON

以下是test1.txt的一部分：

{"created_at":"Fri Jul 24 16:35:22 +0000 2015","id":624618886277640192,"id_str":"624618886277640192","text":"RT @nodenow: Essential Steps: Long Term Support for Node.js\nhttp:\/\/t.co\/MzPfvenwtT\n+1 micshasan #javascript","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3290861609,"id_str":"3290861609","name":"Rajiin","screen_name":"Rajiin_07","location":"Pokhara city","url":"http:\/\/www.pokharacity.com","description":null,"protected":false,"verified":false,"followers_count":1101,"friends_count":1119,"listed_count":155,"favourites_count":2048,"statuses_count":5498,"created_at":"Wed May 20 04:58:23 +0000 2015","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"4A913C","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/617620457336893440\/3HTEKnMx_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/617620457336893440\/3HTEKnMx_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/3290861609\/1435854327","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Fri Jul 24 16:33:04 +0000 2015","id":624618308050915328,"id_str":"624618308050915328","text":"Essential Steps: Long Term Support for Node.js\nhttp:\/\/t.co\/MzPfvenwtT\n+1 micshasan #javascript","source":"\u003ca href=\"http:\/\/ifttt.com\" rel=\"nofollow\"\u003eIFTTT\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3243544179,"id_str":"3243544179","name":"Javascript Digest","screen_name":"nodenow","location":"","url":null,"description":null,"protected":false,"verified":false,"followers_count":1238,"friends_count":1,"listed_count":1148,"favourites_count":2,"statuses_count":130923,"created_at":"Sat May 09 15:45:13 +0000 2015","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/597066594334941184\/Xe4tTtU8_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/597066594334941184\/Xe4tTtU8_normal.jpg","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":1,"favorite_count":0,"entities":{"hashtags":[{"text":"javascript","indices":[83,94]}],"trends":[],"urls":[{"url":"http:\/\/t.co\/MzPfvenwtT","expanded_url":"http:\/\/bit.ly\/1LH81ly","display_url":"bit.ly\/1LH81ly","indices":[47,69]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"javascript","indices":[96,107]}],"trends":[],"urls":[{"url":"http:\/\/t.co\/MzPfvenwtT","expanded_url":"http:\/\/bit.ly\/1LH81ly","display_url":"bit.ly\/1LH81ly","indices":[60,82]}],"user_mentions":[{"screen_name":"nodenow","name":"Javascript Digest","id":3243544179,"id_str":"3243544179","indices":[3,11]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1437755722003"}


{"created_at":"Fri Jul 24 16:35:22 +0000 2015","id":624618888387432449,"id_str":"624618888387432449","text":"python \u041c\u043e\u0441\u043a\u0432\u0430  http:\/\/t.co\/itYJmgVvgD","source":"\u003ca href=\"http:\/\/gdepraktika.ru\" rel=\"nofollow\"\u003egdepraktika-trfnslator\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":623605809,"id_str":"623605809","name":"\u0413\u0434\u0435 \u043f\u0440\u0430\u043a\u0442\u0438\u043a\u0430?","screen_name":"gdepraktika","location":"\u0420\u043e\u0441\u0441\u0438\u044f","url":"http:\/\/gdepraktika.ru","description":"\u041f\u0440\u0430\u043a\u0442\u0438\u043a\u0430, \u0441\u0442\u0430\u0436\u0438\u0440\u043e\u0432\u043a\u0430, \u0440\u0430\u0431\u043e\u0442\u0430 \u0434\u043b\u044f \u0441\u0442\u0443\u0434\u0435\u043d\u0442\u043e\u0432, \u043e\u0431\u0443\u0447\u0435\u043d\u0438\u0435 \u0432 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u044f\u0445","protected":false,"verified":false,"followers_count":17,"friends_count":9,"listed_count":0,"favourites_count":0,"statuses_count":902069,"created_at":"Sun Jul 01 07:53:36 +0000 2012","utc_offset":10800,"time_zone":"Moscow","geo_enabled":false,"lang":"ru","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/378800000420815111\/bba61a6dcd4272794a4af41dd8a44cf5_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/378800000420815111\/bba61a6dcd4272794a4af41dd8a44cf5_normal.png","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[{"url":"http:\/\/t.co\/itYJmgVvgD","expanded_url":"http:\/\/bit.ly\/1GqpqOg","display_url":"bit.ly\/1GqpqOg","indices":[15,37]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1437755722506"}

这是因为在有效的json行之间有两个空行。只要加一张空行支票，你就可以走了

import json
tweets_data = []
notParsed = []
tweets_file = open('test1.txt',"r")
for line in tweets_file:    
    if line.strip():    
        try:
            tweet=json.load(line)
            tweets_data.append(tweet)
        except:
            notParsed.append(line)
            continue
print(len(tweets_data))
print('Could not parse: ', len(notParsed))

这不是必需的，因为您的答案，我正在修改Python，但您可以按如下方式编辑代码：

map（json.load，[x代表x在open（'test1.txt'）.read（）.split（'\n'），如果x.strip（）]）

两件事。首先，您的段看起来根本不像有效的json。（复制粘贴到验证器中证实了这一点。从语法上讲，两条tweet之间需要有一个逗号，原因是它们需要是某个更高级数据结构中的元素（json等价于python列表==一个“数组”，或者json等价于dict，一个“对象”）。json文件中只能有一个根元素。请参阅前面的问题：

第二，如果您只是试图获得对json数据结构的普通访问，而不是试图做任何依赖于行概念的特殊操作（或担心内存管理等），那么您就不需要像这样逐行阅读它了。相反，您可以将整个shebang绑定到一个变量，它会根据明显的语法将json转换为嵌套列表和dict（即json curlybraces/bracket函数与python函数相同）

因此，一旦json有效，代码就可以简单到：

import json
with open('test1.txt') as json_file:
  myjson = json.load(json_file)

然后通过列表索引/dict键访问json中的每个元素

此方法忽略空白，所以换行符应该没有问题

最终，这表明解决方案是将您的推文包装在顶级列表（“数组”）中，并在它们之间粘贴逗号（可能将正则表达式作为字符串操作？），然后通过列表索引而不是逐行访问它们。

一行可能有\r\n空间。等等您应该使用

json.loads（）

哪一行导致了问题？空行是有效的json？实际上，您发布的Look e validis没有一个Python模块能够像典型的在线验证器那样聪明地加载这些json？这不是我第一次返回成功，只是json模块阻塞。@JLPeyret，嗯，空行不是有效的json，regardless解析器多么聪明。如果json不能被解析，它不应该被忽略，尽管OP已经忽略了它们。也为此添加了一个通知程序。太好了！刚刚尝试了“notParsed（）”，每一行都是“notParsed”…并且你的解决方案震动了！

{“whitespaces”：“are”，“ignored”：“in”，“json”：“data”}

。但空输入不是有效的JSON@张文辉, 是的，我尝试加载（），但它显示“ValueError:期望值：第2行第1列（字符1）”

import json

tweets_data=[]
tweets_file=open('test1.txt',"r")
for line in tweets_file:
    try:
        tweet=json.loads(line.strip())
        tweets_data.append(tweet)
    except:
        continue

print(len(tweets_data))