使用python和dataframe将json复杂转换为csv_Python_Pandas_Databricks

使用python和dataframe将json复杂转换为csv

python pandas

使用python和dataframe将json复杂转换为csv,python,pandas,databricks,Python,Pandas,Databricks,我知道这个问题已经被问过很多次了，但我仍然无法将其转换为json 我的json文件如下所示： { "itemCostPrices": { "Id": 1, "costPrices": [{ "costPrice": 83.56, "currencyCode": "GBP", "startDateValid": "2010-09-06", "endDateVali

我知道这个问题已经被问过很多次了，但我仍然无法将其转换为json

我的json文件如下所示：

{
    "itemCostPrices": {
        "Id": 1,
        "costPrices": [{
            "costPrice": 83.56,
            "currencyCode": "GBP",
            "startDateValid": "2010-09-06",
            "endDateValid": "2011-05-01",
            "postCalculatedCostPriceFlag": false,
            "promoCostPriceFlag": true
        }]
    },
    "eventId": null,
    "eventDateTime": null
}

请尝试以下代码：

import json
import pandas as pd

def flatten_dict(d, acc={}):
    for k, v in d.items():
        if isinstance(v, dict):
            flatten_dict(v, acc)
        elif isinstance(v, list):
            for l in v:
                flatten_dict(l, acc)
        else:
            acc[k] = v
    return acc


with open('tmp.json') as f:
    data = json.load(f)

df = pd.DataFrame([flatten_dict(d, {}) for d in data])
df.to_csv('tmp.csv', index=False)

代码说明：
1）读取json文件并将其导入字典：你会得到：

[{'eventDateTime': None,
  'eventId': None,
  'itemCostPrices': {'Id': 1,
                     'costPrices': [{'costPrice': 83.56,
                                     'currencyCode': 'GBP',
                                     'endDateValid': '2011-05-01',
                                     'postCalculatedCostPriceFlag': False,
                                     'promoCostPriceFlag': True,
                                     'startDateValid': '2010-09-06'}]}},
 {'eventDateTime': None,
  'eventId': None,
  'itemCostPrices': {'Id': 2,
                     'costPrices': [{'costPrice': 99.56,
                                     'currencyCode': 'EUR',
                                     'endDateValid': '2017-05-01',
                                     'postCalculatedCostPriceFlag': False,
                                     'promoCostPriceFlag': True,
                                     'startDateValid': '2018-09-06'}]}}]

   Id  costPrice currencyCode endDateValid eventDateTime eventId  postCalculatedCostPriceFlag  promoCostPriceFlag startDateValid
0   1      83.56          GBP   2011-05-01          None    None                        False                True     2010-09-06
1   2      99.56          EUR   2017-05-01          None    None                        False                True     2018-09-06

2）展开字典：您将获得以下扁平化dict列表：

[{'Id': 1,
  'costPrice': 83.56,
  'currencyCode': 'GBP',
  'startDateValid': '2010-09-06',
  'endDateValid': '2011-05-01',
  'postCalculatedCostPriceFlag': False,
  'promoCostPriceFlag': True,
  'eventId': None,
  'eventDateTime': None},
 {'Id': 2,
  'costPrice': 99.56,
  'currencyCode': 'EUR',
  'startDateValid': '2018-09-06',
  'endDateValid': '2017-05-01',
  'postCalculatedCostPriceFlag': False,
  'promoCostPriceFlag': True,
  'eventId': None,
  'eventDateTime': None}]

3）在数据帧中加载字典你会得到：

[{'eventDateTime': None,
  'eventId': None,
  'itemCostPrices': {'Id': 1,
                     'costPrices': [{'costPrice': 83.56,
                                     'currencyCode': 'GBP',
                                     'endDateValid': '2011-05-01',
                                     'postCalculatedCostPriceFlag': False,
                                     'promoCostPriceFlag': True,
                                     'startDateValid': '2010-09-06'}]}},
 {'eventDateTime': None,
  'eventId': None,
  'itemCostPrices': {'Id': 2,
                     'costPrices': [{'costPrice': 99.56,
                                     'currencyCode': 'EUR',
                                     'endDateValid': '2017-05-01',
                                     'postCalculatedCostPriceFlag': False,
                                     'promoCostPriceFlag': True,
                                     'startDateValid': '2018-09-06'}]}}]

   Id  costPrice currencyCode endDateValid eventDateTime eventId  postCalculatedCostPriceFlag  promoCostPriceFlag startDateValid
0   1      83.56          GBP   2011-05-01          None    None                        False                True     2010-09-06
1   2      99.56          EUR   2017-05-01          None    None                        False                True     2018-09-06

4）将数据帧另存为csv

你在试什么？你得到了什么输出？它与预期产出有何不同？pandas和databricks与此有什么关系？我尝试了您在azure databricks中提供的上述代码，但它给出了错误“'str'对象没有属性'item'，因为数据格式与示例不同。你如何读取数据？使用json.load（）？如果你打印“type（data）”和“len（data）”，它们会给你什么？我敢肯定你的json格式不好。正确的格式，如示例中所示：“[{obj1}，{obj2}等..]”。相反，可能在您的文件中，方括号丢失了，{obj1}、{obj2}等…，因此完成了。在这种情况下，您只需要在文件的开头和结尾添加方括号。加上方括号后效果很好我很高兴我帮了你！请投票并把我的答案记为最佳答案。谢谢：）

   Id  costPrice currencyCode endDateValid eventDateTime eventId  postCalculatedCostPriceFlag  promoCostPriceFlag startDateValid
0   1      83.56          GBP   2011-05-01          None    None                        False                True     2010-09-06
1   2      99.56          EUR   2017-05-01          None    None                        False                True     2018-09-06

df.to_csv('tmp.csv', index=False)