如何修复一个Python函数来迭代目录中的JSON文件列表并合并到单个JSON文件中_Python_Json

如何修复一个Python函数来迭代目录中的JSON文件列表并合并到单个JSON文件中

python json

如何修复一个Python函数来迭代目录中的JSON文件列表并合并到单个JSON文件中,python,json,Python,Json,我有一个不断生成JSON文件的设备——a.JSON、b.JSON、c.JSON等等，并将它们存储在文件夹目录中，如下所示 “Data/d/a.json” “Data/d/b.json” “Data/d/c.json” . . . . “Data/d/g.json” 每个JSON文件中的示例数据 a、 json {"artist":null,"auth":"Logged In","firstName":"Walter","gender":"M","itemInSession":0,"lastN

我有一个不断生成JSON文件的设备——a.JSON、b.JSON、c.JSON等等，并将它们存储在文件夹目录中，如下所示

“Data/d/a.json” 
“Data/d/b.json”
“Data/d/c.json”
.
.
.
.
“Data/d/g.json”

每个JSON文件中的示例数据

a、 json

{"artist":null,"auth":"Logged In","firstName":"Walter","gender":"M","itemInSession":0,"lastName":"Frye","length":null,"level":"free","location":"San Francisco-Oakland-Hayward, CA","method":"GET","page":"Home","registration":1540919166796.0,"sessionId":38,"song":null,"status":200,"ts":1541105830796,"userAgent":"\"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/36.0.1985.143 Safari\/537.36\"","userId":"39"}
{"artist":null,"auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":0,"lastName":"Summers","length":null,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"GET","page":"Home","registration":1540344794796.0,"sessionId":139,"song":null,"status":200,"ts":1541106106796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}

b、 json

{"artist":"Des'ree","auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":1,"lastName":"Summers","length":246.30812,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"PUT","page":"NextSong","registration":1540344794796.0,"sessionId":139,"song":"You Gotta Be","status":200,"ts":1541106106796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
{"artist":null,"auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":2,"lastName":"Summers","length":null,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"GET","page":"Upgrade","registration":1540344794796.0,"sessionId":139,"song":null,"status":200,"ts":1541106132796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}

c、 json

{"artist":"Mr Oizo","auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":3,"lastName":"Summers","length":144.03873,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"PUT","page":"NextSong","registration":1540344794796.0,"sessionId":139,"song":"Flat 55","status":200,"ts":1541106352796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}
{"artist":"Tamba Trio","auth":"Logged In","firstName":"Kaylee","gender":"F","itemInSession":4,"lastName":"Summers","length":177.18812,"level":"free","location":"Phoenix-Mesa-Scottsdale, AZ","method":"PUT","page":"NextSong","registration":1540344794796.0,"sessionId":139,"song":"Quem Quiser Encontrar O Amor","status":200,"ts":1541106496796,"userAgent":"\"Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/35.0.1916.153 Safari\/537.36\"","userId":"8"}

这些文件可以增长到每天多达1000个JSON文件，每周多达1000个文件。为了进一步处理这些JSON文件中的数据，我必须将每个JSON文件中的数据批量插入到PostgreSQL中，正如您在下面的代码片段中所看到的，但是当前的过程过于手工，效率低下，因为我一个接一个地插入每个文件

import json
import psycopg2

connection = psycopg2.connect("host=localhost dbname=devicedb user=#### password=####")
cursor = connection.cursor()
connection.set_session(autocommit=True)
cursor.execute("create table if not exists events_table(artist text, auth text, firstName text, gender varchar, itemInSession int, lastName text, length text, level text, location text, method varchar, page text, registration text, sessionId int, song text, status int, ts bigint, userAgent text, userId int );")

data = []
with open('Data/d/a.json ') as f:
    for line in f:
        data.append(json.loads(line))

columns = [
    'artist',
    'auth',
    'firstName',
    'gender',
    'itemInSession',
    'lastName',
    'length',
    'level',
    'location',
    'method',
    'page',
    'registration',
    'sessionId',
    'song',
    'status',
    'ts',
    'userAgent',
    'userId'
]

for item in data:
    my_data = [item[column] for column in columns]
    for i, v in enumerate(my_data):
        if isinstance(v, dict):
            my_data[i] = json.dumps(v)

    insert_query = "INSERT INTO events_table VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"
    cursor.execute(insert_query, tuple(my_data))

为了改进当前的流程，我在网上搜索并在下面找到了将多个文件合并为单个文件的功能。我对该函数的理解是，我可以通过指向merged.json作为我的合并文件和包含输入json文件列表的目录来定义我的输出文件名和输入文件名，然后运行该函数，但似乎我错了。拜托，谁能告诉我我做错了什么

def cat_json(output_filename, input_filenames):
    with file(output_filename, "w") as outfile:
        first = True
        for infile_name in input_filenames:
            with file(infile_name) as infile:
                if first:
                    outfile.write('[')
                    first = False
                else:
                    outfile.write(',')
                outfile.write(mangle(infile.read()))
        outfile.write(']')

output_filename = 'data/d/merged.json'
input_filenames = 'data/d/*.json'
cat_json(output_filename, input_filenames)

我犯了以下错误

TypeError                                 Traceback (most recent call last)
<ipython-input-19-3ff012d91d76> in <module>()
      1 output_filename = 'data/d/merged.json'
      2 input_filenames = 'data/d/*.json'
----> 3 cat_json(output_filename, input_filenames)

<ipython-input-18-760b670f79b1> in cat_json(output_filename, input_filenames)
      1 def cat_json(output_filename, input_filenames):
----> 2     with file(output_filename, "w") as outfile:
      3         first = True
      4         for infile_name in input_filenames:
      5             with file(infile_name) as infile:

TypeError: 'str' object is not callable

代码创建了merged.Json文件，但没有内容，并出现以下错误

-------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-16-40d7387f704a> in <module>()
      1 output_filename = 'merged.json'
      2 input_filenames = 'data/d/*.json'
----> 3 cat_json(output_filename, input_filenames)

<ipython-input-15-951cbaba7765> in cat_json(output_filename, input_filenames)
      3         first = True
      4         for infile_name in input_filenames:
----> 5             with open(infile_name) as infile:
      6                 if first:
      7                     outfile.write('[')

FileNotFoundError: [Errno 2] No such file or directory: 'd'

-------------------------------------------------------------------------
FileNotFoundError回溯（最近一次调用上次）
在（）
1输出文件名='merged.json'
2输入文件名='data/d/*.json'
---->3 cat_json（输出_文件名、输入_文件名）
在cat_json中（输出_文件名，输入_文件名）
3第一个=正确
4对于输入文件名中的填充文件名：
---->5以打开（填充名称）作为填充：
6如果首先：
7 outfile.write（“[”）
FileNotFoundError:[Errno 2]没有这样的文件或目录：“d”

我不明白为什么它会给出上面的错误，并说没有这样的文件或目录。a.json、b.json、c.json…驻留在“data/d/”目录中，或者我需要提到每个文件名而不是*.json吗？

我真的不明白你所说的合并json是什么意思，但我知道你为什么会出现这样的错误

而不是

with file(output_filename, "w") as outfile:

这样做

with open(output_filename, "w") as outfile:

文件

不是函数。

打开

用于打开文件

希望通过合并JSON文件有所帮助

，我的意思是将a.JSON、b.JSON、c.JSON……等内容合并到一个JSON文件中-merged.JSON

with open(output_filename, "w") as outfile: