Python 将多个CSV文件导入Postgresql_Python_Postgresql

Python 将多个CSV文件导入Postgresql

python postgresql

Python 将多个CSV文件导入Postgresql,python,postgresql,Python,Postgresql,我目前正在学习如何编写代码，我遇到了这一挑战，这是我过去几天一直在努力解决的问题我有超过2000个CSV文件，我希望使用pgadmin 4上的导入数据功能一次导入一个特定的postgresql表，该功能一次只允许导入一个CSV文件。我该怎么做呢？我正在使用Windows操作系统。简单的方法是使用Cygwin或内部Ubuntu shell来使用这个脚本 all_files=("file_1.csv" "file_2.csv") # OR u can change to * in dir dir

我目前正在学习如何编写代码，我遇到了这一挑战，这是我过去几天一直在努力解决的问题

我有超过2000个CSV文件，我希望使用pgadmin 4上的导入数据功能一次导入一个特定的postgresql表，该功能一次只允许导入一个CSV文件。我该怎么做呢？我正在使用Windows操作系统。

简单的方法是使用Cygwin或内部Ubuntu shell来使用这个脚本

all_files=("file_1.csv" "file_2.csv") # OR u can change to * in dir

dir_name=<path_to_files>

export PGUSER=<username_here>
export PGPASSWORD=<password_here>
export PGHOST=localhost
export PGPORT=5432
db_name=<dbname_here>

echo "write db"
for file in ${all_files[*]}; do
  psql -U$db_name -a -f $dir_name/"${file}"".sql" >/dev/null
done

all_files=（“file_1.csv”“file_2.csv”）#或者您可以在目录中更改为*
目录名=
导出PGUSER=
导出密码=
export PGHOST=localhost
导出PGPORT=5432
数据库名称=
回显“写数据库”
对于${all_files[*]}中的文件；做
psql-U$db\U name-a-f$dir\U name/“${file}”“.sql”>/dev/null
完成

如果您想纯粹用Python实现这一点，那么我在下面给出了一种方法。您可能不需要将列表分块（您可以一次将所有文件保存在内存中，而不需要成批执行）。也有可能所有文件的大小完全不同，您需要比批处理更复杂的东西来防止创建超出RAM的内存中文件对象。或者，您可以选择在2000个单独的事务中执行此操作，但我怀疑某种批处理会更快（未经测试）

所有文件的总大小是多少？@roganjosh包含这些文件的文件夹大小约为1.13GB。我安装了Cygwin，但目录名为E:\EventsMap。对于我来说，出现了一个错误“command not found”（命令未找到）。谢谢@roganjosh，我对代码进行了一些编辑，现在它可以运行了，但调用函数“chunks（12000）”，我收到此消息“”，没有数据被复制到postgresql。这里有任何指导。@TryingtoCode您编辑它以创建生成器。我不知道你为什么这么做，但我不能调试我看不到的东西，我的答案也不会导致这个问题，所以我不确定我应该回答什么

import csv
import io
import os
import psycopg2

CSV_DIR = 'the_csv_folder/' # Relative path here, might need to be an absolute path

def chunks(l, n):
    """ 
    https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
    """
    n = max(1, n)
    return [l[i:i+n] for i in range(0, len(l), n)]


# Get a list of all the CSV files in the directory
all_files = os.listdir(CSV_DIR)

# Chunk the list of files. Let's go with 100 files per chunk, can be changed
chunked_file_list = chunks(all_files, 100)

# Iterate the chunks and aggregate the files in each chunk into a single
# in-memory file
for chunk in chunked_file_list:

    # This is the file to aggregate into
    string_buffer = io.StringIO()
    csv_writer = csv.writer(string_buffer)

    for file in chunk:
        with open(CSV_DIR + file) as infile:
            reader = csv.reader(infile)
            data = reader.readlines()

        # Transfer the read data to the aggregated file
        csv_writer.writerows(data)

    # Now we have aggregated the chunk, copy the file to Postgres
    with psycopg2.connect(dbname='the_database_name', 
                          user='the_user_name',
                          password='the_password', 
                          host='the_host') as conn:
        c = conn.cursor()

        # Headers need to the table field names, in the order they appear in
        # the csv
        headers = ['first_name', 'last_name', ...]

        # Now upload the data as though it was a file
        c.copy_from(string_buffer, 'the_table_name', sep=',', columns=headers)
        conn.commit()