从嵌套JSON文件中提取文本，其中每个JSON对象在Python中具有可变数量的条目_Python_Json

从嵌套JSON文件中提取文本，其中每个JSON对象在Python中具有可变数量的条目

python json

从嵌套JSON文件中提取文本，其中每个JSON对象在Python中具有可变数量的条目,python,json,Python,Json,我有一个json文件，其中包含多个嵌套的json对象，如下所示： { "coordinates": null, "acoustic_features": { "instrumentalness": "0.00479", "liveness": "0.18", "speechiness": "0.0294", "danceability": "0.634", "valence": "0.342", "loudness": "-8.345",

我有一个json文件，其中包含多个嵌套的json对象，如下所示：

{
"coordinates": null,
"acoustic_features": {
    "instrumentalness": "0.00479",
    "liveness": "0.18",
    "speechiness": "0.0294",
    "danceability": "0.634",
    "valence": "0.342",
    "loudness": "-8.345",
    "tempo": "125.044",
    "acousticness": "0.00035",
    "energy": "0.697",
    "mode": "1",
    "key": "6"
},
"artist_id": "b2980c722a1ace7a30303718ce5491d8",
"place": null,
"geo": null,
"tweet_lang": "en",
"source": "Share.Radionomy.com",
"track_title": "8eeZ",
"track_id": "cd52b3e5b51da29e5893dba82a418a4b",
"artist_name": "Dominion",
"entities": {
    "hashtags": [{
        "text": "nowplaying",
        "indices": [0, 11]
    }, {
        "text": "goth",
        "indices": [51, 56]
    }, {
        "text": "deathrock",
        "indices": [57, 67]
    }, {
        "text": "postpunk",
        "indices": [68, 77]
    }],
    "symbols": [],
    "user_mentions": [],
    "urls": [{
        "indices": [28, 50],
        "expanded_url": "cathedral13.com/blog13",
        "display_url": "cathedral13.com/blog13",
        "url": "t.co/Tatf4hEVkv"
    }]
},
"created_at": "2014-01-01 05:54:21",
"text": "#nowplaying Dominion - 8eeZ Tatf4hEVkv #goth #deathrock #postpunk",
"user": {
    "location": "middle of nowhere",
    "lang": "en",
    "time_zone": "Central Time (US & Canada)",
    "name": "Cathedral 13",
    "entities": null,
    "id": 81496937,
    "description": "I\u2019m a music junkie who is currently responsible for 
Cathedral 13 internet radio (goth, deathrock, post-punk)which has been 
online since 06/20/02."
},
"id": 418243774842929150
}

每个对象包含可变数量的hashtag。我想获得一个包含标签文本的csv文件。为此，我编写了以下代码：

import csv
with open('jsonpart.json') as data_file:
    data = json.load(data_file)
    #print (data)
    header = ['hashtags']

# open a file for writing
data_csv = open('hashtags.csv', 'wb')
# create the csv writer object
csvwriter = csv.writer(data_csv)

# write the csv header
csvwriter.writerow(header)

for entry in data:
    csvwriter.writerow([entry['entities']['hashtags']])

data_csv.close()

我得到以下输出文件：

"[{u'indices': [0, 11], u'text': u'nowplaying'}, {u'indices': [51, 56], 
 u'text': u'goth'}, {u'indices': [57, 67], u'text': u'deathrock'}, 
{u'indices': [68, 77], u'text': u'postpunk'}]"
"[{u'indices': [23, 34], u'text': u'NowPlaying'}, {u'indices': [75, 79], 
u'text': u'80s'}, {u'indices': [80, 86], u'text': u'Retro'}, {u'indices': 
[87, 91], u'text': u'Fun'}]"
"[{u'indices': [0, 11], u'text': u'nowplaying'}]"
"[{u'indices': [54, 65], u'text': u'nowplaying'}, {u'indices': [66, 77], 
u'text': u'listenlive'}]"

我被困在这里了。如何将目标文件获取为：

nowplaying
goth
deathrock
postpunk
NowPlaying  
80's
Retro
Fun
nowplaying
nowplaying
listenlive

你可以使用一个简单的列表。假设您有一个名为json_chunk的json对象，您可以创建如下列表：

text_list=[hashtag['text']用于json_区块['entities']['hashtags']]中的hashtag

现在您有了一个列表。迭代它一些元素显然有一个新行字符，其他元素没有-所以去掉所有元素并向所有元素添加新行字符，然后将每个元素写入一个文件，如下所示：

with open(r'C:\outputfile.csv', 'a', encoding='utf-8') as fd:
    for line in text_list:
    fd.write(line.strip()+'\n')

我不认为这是80年代的复古乐趣，现在在你的生活中玩，现在玩，听json@RomanPerekhrest这些是下一个JSON对象的“实体”“哈希标记”“文本”的值。@Aaron Digulla，你知道如何解决这个问题吗？谢谢，@jlaur。我得到的结果文件格式为：nowplaying、goth、deathrock、PostUnk\n nowplaying、80年代、Retro、Fun\n nowplaying\n nowplaying、listenlive。。。我如何以问题中提到的格式获取它？我更新了代码。但你应该在问怎么做之前先弄清楚。你有一张单子。如何遍历列表并将其写入文件。问问谷歌。在互联网上有很多类似的教程。