无法在python中加载json文件

无法在python中加载json文件,python,json,Python,Json,我得到了一个json格式的推特流数据文件。现在我尝试用python加载它: import json tweets_data=[] tweets_file=open('test1.txt',"r") for line in tweets_file: try: tweet=json.load(line) tweets_data.append(tweet) except: continue print(len(tweets_data

我得到了一个json格式的推特流数据文件。现在我尝试用python加载它:

import json

tweets_data=[]
tweets_file=open('test1.txt',"r")
for line in tweets_file:
    try:
        tweet=json.load(line)
        tweets_data.append(tweet)
    except:
        continue 

print(len(tweets_data))
结果总是0。如果删除了“try”和“except”,则错误为“ValueError:期望值:第2行第1列(字符1)”。然而,根据在线验证器,文件的每一行都是有效的JSON

以下是test1.txt的一部分:

{"created_at":"Fri Jul 24 16:35:22 +0000 2015","id":624618886277640192,"id_str":"624618886277640192","text":"RT @nodenow: Essential Steps: Long Term Support for Node.js\nhttp:\/\/t.co\/MzPfvenwtT\n+1 micshasan #javascript","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3290861609,"id_str":"3290861609","name":"Rajiin","screen_name":"Rajiin_07","location":"Pokhara city","url":"http:\/\/www.pokharacity.com","description":null,"protected":false,"verified":false,"followers_count":1101,"friends_count":1119,"listed_count":155,"favourites_count":2048,"statuses_count":5498,"created_at":"Wed May 20 04:58:23 +0000 2015","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"4A913C","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/617620457336893440\/3HTEKnMx_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/617620457336893440\/3HTEKnMx_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/3290861609\/1435854327","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Fri Jul 24 16:33:04 +0000 2015","id":624618308050915328,"id_str":"624618308050915328","text":"Essential Steps: Long Term Support for Node.js\nhttp:\/\/t.co\/MzPfvenwtT\n+1 micshasan #javascript","source":"\u003ca href=\"http:\/\/ifttt.com\" rel=\"nofollow\"\u003eIFTTT\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3243544179,"id_str":"3243544179","name":"Javascript Digest","screen_name":"nodenow","location":"","url":null,"description":null,"protected":false,"verified":false,"followers_count":1238,"friends_count":1,"listed_count":1148,"favourites_count":2,"statuses_count":130923,"created_at":"Sat May 09 15:45:13 +0000 2015","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/597066594334941184\/Xe4tTtU8_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/597066594334941184\/Xe4tTtU8_normal.jpg","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":1,"favorite_count":0,"entities":{"hashtags":[{"text":"javascript","indices":[83,94]}],"trends":[],"urls":[{"url":"http:\/\/t.co\/MzPfvenwtT","expanded_url":"http:\/\/bit.ly\/1LH81ly","display_url":"bit.ly\/1LH81ly","indices":[47,69]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"javascript","indices":[96,107]}],"trends":[],"urls":[{"url":"http:\/\/t.co\/MzPfvenwtT","expanded_url":"http:\/\/bit.ly\/1LH81ly","display_url":"bit.ly\/1LH81ly","indices":[60,82]}],"user_mentions":[{"screen_name":"nodenow","name":"Javascript Digest","id":3243544179,"id_str":"3243544179","indices":[3,11]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1437755722003"}


{"created_at":"Fri Jul 24 16:35:22 +0000 2015","id":624618888387432449,"id_str":"624618888387432449","text":"python \u041c\u043e\u0441\u043a\u0432\u0430  http:\/\/t.co\/itYJmgVvgD","source":"\u003ca href=\"http:\/\/gdepraktika.ru\" rel=\"nofollow\"\u003egdepraktika-trfnslator\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":623605809,"id_str":"623605809","name":"\u0413\u0434\u0435 \u043f\u0440\u0430\u043a\u0442\u0438\u043a\u0430?","screen_name":"gdepraktika","location":"\u0420\u043e\u0441\u0441\u0438\u044f","url":"http:\/\/gdepraktika.ru","description":"\u041f\u0440\u0430\u043a\u0442\u0438\u043a\u0430, \u0441\u0442\u0430\u0436\u0438\u0440\u043e\u0432\u043a\u0430, \u0440\u0430\u0431\u043e\u0442\u0430 \u0434\u043b\u044f \u0441\u0442\u0443\u0434\u0435\u043d\u0442\u043e\u0432, \u043e\u0431\u0443\u0447\u0435\u043d\u0438\u0435 \u0432 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u044f\u0445","protected":false,"verified":false,"followers_count":17,"friends_count":9,"listed_count":0,"favourites_count":0,"statuses_count":902069,"created_at":"Sun Jul 01 07:53:36 +0000 2012","utc_offset":10800,"time_zone":"Moscow","geo_enabled":false,"lang":"ru","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/378800000420815111\/bba61a6dcd4272794a4af41dd8a44cf5_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/378800000420815111\/bba61a6dcd4272794a4af41dd8a44cf5_normal.png","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[{"url":"http:\/\/t.co\/itYJmgVvgD","expanded_url":"http:\/\/bit.ly\/1GqpqOg","display_url":"bit.ly\/1GqpqOg","indices":[15,37]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"und","timestamp_ms":"1437755722506"}

这是因为在有效的json行之间有两个空行。只要加一张空行支票,你就可以走了

import json
tweets_data = []
notParsed = []
tweets_file = open('test1.txt',"r")
for line in tweets_file:    
    if line.strip():    
        try:
            tweet=json.load(line)
            tweets_data.append(tweet)
        except:
            notParsed.append(line)
            continue
print(len(tweets_data))
print('Could not parse: ', len(notParsed))
这不是必需的,因为您的答案,我正在修改Python,但您可以按如下方式编辑代码:


map(json.load,[x代表x在open('test1.txt').read().split('\n'),如果x.strip()])
两件事。首先,您的段看起来根本不像有效的json。(复制粘贴到验证器中证实了这一点。从语法上讲,两条tweet之间需要有一个逗号,原因是它们需要是某个更高级数据结构中的元素(json等价于python列表==一个“数组”,或者json等价于dict,一个“对象”)。json文件中只能有一个根元素。请参阅前面的问题:

第二,如果您只是试图获得对json数据结构的普通访问,而不是试图做任何依赖于行概念的特殊操作(或担心内存管理等),那么您就不需要像这样逐行阅读它了。相反,您可以将整个shebang绑定到一个变量,它会根据明显的语法将json转换为嵌套列表和dict(即json curlybraces/bracket函数与python函数相同)

因此,一旦json有效,代码就可以简单到:

import json
with open('test1.txt') as json_file:
  myjson = json.load(json_file)
然后通过列表索引/dict键访问json中的每个元素

此方法忽略空白,所以换行符应该没有问题


最终,这表明解决方案是将您的推文包装在顶级列表(“数组”)中,并在它们之间粘贴逗号(可能将正则表达式作为字符串操作?),然后通过列表索引而不是逐行访问它们。

一行可能有\r\n空间。等等 您应该使用
json.loads()


哪一行导致了问题?空行是有效的json?实际上,您发布的Look e validis没有一个Python模块能够像典型的在线验证器那样聪明地加载这些json?这不是我第一次返回成功,只是json模块阻塞。@JLPeyret,嗯,空行不是有效的json,regardless解析器多么聪明。如果json不能被解析,它不应该被忽略,尽管OP已经忽略了它们。也为此添加了一个通知程序。太好了!刚刚尝试了“notParsed()”,每一行都是“notParsed”…并且你的解决方案震动了!
{“whitespaces”:“are”,“ignored”:“in”,“json”:“data”}
。但空输入不是有效的JSON@张文辉, 是的,我尝试加载(),但它显示“ValueError:期望值:第2行第1列(字符1)”
import json

tweets_data=[]
tweets_file=open('test1.txt',"r")
for line in tweets_file:
    try:
        tweet=json.loads(line.strip())
        tweets_data.append(tweet)
    except:
        continue

print(len(tweets_data))