Python 解析JSON行文件_Python_Json_Csv

Python 解析JSON行文件

python json csv

Python 解析JSON行文件,python,json,csv,Python,Json,Csv,我需要找到一种方法将json文件中的数据解析为csv或xlsx。然而，我在线使用的每个JSON验证器都会给我一个错误，说JSON文件无效 JSON文件的示例如下： {"id": "someID1.docx", "language": {"detected": "cs"}, "title": "Name - Title - FileName", "text": "Long string of text", "entities": [ {"standardForm": "Svářečsk

我需要找到一种方法将json文件中的数据解析为csv或xlsx。然而，我在线使用的每个JSON验证器都会给我一个错误，说JSON文件无效

JSON文件的示例如下：

{"id": "someID1.docx",
 "language": {"detected": "cs"},
 "title": "Name - Title - FileName",
 "text": "Long string of text",
 "entities": [
 {"standardForm": "Svářečský průkaz", "type": "car"},
 {"standardForm": "email1@gmail.com", "type": "email"},
 {"standardForm": "english", "type": "languages"},
 {"standardForm": "Práce na PC", "type": "abilities"},
 {"standardForm": "MS Office", "type": "abilities"},
 {"standardForm": "Automechanik", "type": "education"},
 {"standardForm": "Střední průmyslová škola", "type": "education"},
 {"standardForm": "Angličtina-Němčina", "type": "languages"},
 {"standardForm": "mechanic", "type": "position"},
 {"standardForm": "Praha", "type": "region"},
 {"standardForm": "B2 - středně pokročilý", "type": "en_level"},
 {"standardForm": "Skupina B", "type": "drivinglicense"}
 ]}
{"id": "someID2.pdf",
 "language": {"detected": "cs"},
 "title": "Name - Title - FileName2",
 "text": "Long string of text2",
 "entities": [
 {"standardForm": "german", "type": "languages"},
 {"standardForm": "high school", "type": "education"},
 {"standardForm": "Angličtina-Němčina", "type": "languages"},
 {"standardForm": "driver", "type": "position"},
 {"standardForm": "english", "type": "languages"},
 {"standardForm": "university", "type": "education"},
 {"standardForm": "email2@seznam.cz", "type": "email"},
 {"standardForm": "Středočeský", "type": "region"},
 {"standardForm": "Střední", "type": "edulevel"},
 {"standardForm": "manager", "type": "lastposition"},
 {"standardForm": "? – nerozpoznáno", "type": "de_level"},
 {"standardForm": "? – nerozpoznáno", "type": "en_level"},
 {"standardForm": "Skupina C", "type": "drivinglicense"}
 ]}
 ...

我能够在Python中通过以下方式加载此JSON：

import pandas as pd
jsonfile = [json.loads(line) for line in open('jsonfile.json', 'r', encoding='utf-8')]

但我无法通过任何方式将其转换为csv。我需要能够存储与所有ID相关的所有实体，最好是csv。有办法吗？我需要JSON不同吗

谢谢

编辑：我需要上述示例的csv输出如下：

ID;title;languages;education
someID1.docx;Name-Title-FileName;english,Angličtina-Němčina;Automechanik;Střední Prům. škola
seomeID2.pdf;Name-Title-FileName2; german,Angličtina-Němčina,english;high school, university

使用miller（），只需使用

mlr--j2c unparsify然后cut-x-r-f“entit”input.json>output.csv

你有这个CSV

id,language:detected,title,text
someID1.docx,cs,Name - Title - FileName,Long string of text
someID2.pdf,cs,Name - Title - FileName2,Long string of text2

关于选项的一些注释：

```
--j2c
```
将json转换为csv
```
unparsify
```
在所有输入记录上打印字段名并集的记录
```
cut-x-r-f
```
从JSON中删除
```
实体
```
对象

id,language:detected,title,text
someID1.docx,cs,Name - Title - FileName,Long string of text
someID2.pdf,cs,Name - Title - FileName2,Long string of text2

```
--j2c
```
将json转换为csv
```
unparsify
```
在所有输入记录上打印字段名并集的记录
```
cut-x-r-f
```
从JSON中删除
```
实体
```
对象

熊猫。数据帧

df = pd.DataFrame(jsonfile)
df['languages'] = df.apply(lambda x: [item['standardForm'] 
                                      for item in x.entities 
                                      if item['type'] == 'languages'], 
                           axis=1)
df['education'] = df.apply(lambda x: [item['standardForm'] 
                                      for item in x.entities 
                                      if item['type'] == 'education'],
                           axis=1)


df.to_csv(<filename>, columns=['id', 'title', 'languages', 'education'])

df=pd.DataFrame（jsonfile）
df['languages']=df.apply（lambda x:[项['standardForm']
对于x.entities中的项
如果项['type']=='languages']，
轴=1）
df['education']=df.apply（λx:[项目['standardForm']
对于x.entities中的项
如果项目['type']=='education']，
轴=1）
df.to_csv（，列=['id'、'title'、'languages'、'education']）

熊猫。数据帧

df = pd.DataFrame(jsonfile)
df['languages'] = df.apply(lambda x: [item['standardForm'] 
                                      for item in x.entities 
                                      if item['type'] == 'languages'], 
                           axis=1)
df['education'] = df.apply(lambda x: [item['standardForm'] 
                                      for item in x.entities 
                                      if item['type'] == 'education'],
                           axis=1)


df.to_csv(<filename>, columns=['id', 'title', 'languages', 'education'])

df=pd.DataFrame（jsonfile）
df['languages']=df.apply（lambda x:[项['standardForm']
对于x.entities中的项
如果项['type']=='languages']，
轴=1）
df['education']=df.apply（λx:[项目['standardForm']
对于x.entities中的项
如果项目['type']=='education']，
轴=1）
df.to_csv（，列=['id'、'title'、'languages'、'education']）

jsonfile=[json.loads（line）for line in open（'jsonfile.json'，'r'，encoding='utf-8'）]

.json

jsonfile=[json.loads（line）for line in open（'jsonfile.json'，'r'，encoding='utf-8'）]

.json