Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python将MySQL的数据文本文件拆分为多个文本文件_Python_Mysql - Fatal编程技术网

使用Python将MySQL的数据文本文件拆分为多个文本文件

使用Python将MySQL的数据文本文件拆分为多个文本文件,python,mysql,Python,Mysql,我有一个数据txt文件,其格式可以加载到数据库(MySQL)中,格式如下(有些夸张): data.txt name age profession datestamp John 23 engineer 2020-03-01 Amy 17 doctor 2020-02-27 Gordon 19 artist 2020-02-27 Kevin 25 chef 2020-03-01 以上内容是通过python执行以下命令生成的: LOAD DAT

我有一个数据txt文件,其格式可以加载到数据库(MySQL)中,格式如下(有些夸张):

data.txt

name   age profession datestamp
John   23  engineer   2020-03-01
Amy    17  doctor     2020-02-27
Gordon 19  artist     2020-02-27
Kevin  25  chef       2020-03-01
以上内容是通过python执行以下命令生成的:

LOAD DATA LOCAL INFILE '/home/sample_data/data.txt' REPLACE INTO TABLE person_professions 
FIELDS TERMINATED BY 0x01 OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n'
 (name,age,profession,datestamp)
创建data.txt;但是,data.txt对于一次插入整个数据库(设置了大约200 MB的插入限制)来说确实非常大,我想将数据分成几个块(data_1.txt、data_2.txt、data_3.txt等),然后逐个插入,以避免达到插入大小限制。我知道,您可以逐行查找一个条件来分割数据,例如

with open('data.txt', 'w') as f:
    data = f.read().split('\n')
    if some condition:
       with open('data_1.txt', 'w') as f2:
            insert data 

但我不太确定如何设置一个条件断点,使其开始插入新的txt文件,除非有更好的方法

我编写了一个函数,可以根据文件大小完成任务。代码注释中的解释

def split_file(file_name, lines_per_file=100000):
    # Open large file to be read in UTF-8
    with open(file_name, 'r', encoding='utf-8') as rf:
        # Read all lines in file
        lines = rf.readlines()
        print ( str(len(lines)) + ' LINES READ.')
        # Set variables to count file number and count of lines written
        file_no = 0
        wlines_count = 0
        # For x from 0 to length of lines read stepping by number of lines that will be written in each file
        for x in range(0, len(lines), lines_per_file):
            # Open new "split" file for writing in UTF-8
            with open( 'data' + '-' + str(file_no) + '.txt', 'w', encoding='utf-8') as wf:
                # Write lines
                wf.writelines(lines[x:x+lines_per_file])
                # Update the written lines count
                wlines_count += (len(lines[x:x + lines_per_file]))
                # Update new "split" file count mainly for naming
                file_no+=1
        print(str(wlines_count) + " LINES WRITTEN IN " + str(file_no) + " FILES.")

# Split data.txt into files containing 100000 lines
split_file('data.txt',100000)

我编写了一个函数,可以根据文件大小完成任务。代码注释中的解释

def split_file(file_name, lines_per_file=100000):
    # Open large file to be read in UTF-8
    with open(file_name, 'r', encoding='utf-8') as rf:
        # Read all lines in file
        lines = rf.readlines()
        print ( str(len(lines)) + ' LINES READ.')
        # Set variables to count file number and count of lines written
        file_no = 0
        wlines_count = 0
        # For x from 0 to length of lines read stepping by number of lines that will be written in each file
        for x in range(0, len(lines), lines_per_file):
            # Open new "split" file for writing in UTF-8
            with open( 'data' + '-' + str(file_no) + '.txt', 'w', encoding='utf-8') as wf:
                # Write lines
                wf.writelines(lines[x:x+lines_per_file])
                # Update the written lines count
                wlines_count += (len(lines[x:x + lines_per_file]))
                # Update new "split" file count mainly for naming
                file_no+=1
        print(str(wlines_count) + " LINES WRITTEN IN " + str(file_no) + " FILES.")

# Split data.txt into files containing 100000 lines
split_file('data.txt',100000)

不是python的答案,但由于您使用的是unix ish系统,因此可能有可用的
split
@AnthonyKong会根据大小自动分割文件,并确保不会丢失数据?@soohoo原始文件的大小是多少?你的系统内存是多少?您能测试我在答案中发布的函数并告诉我们它是否有效吗?因为您也可以逐行读取文件,速度会慢很多,但不会崩溃。@ThaerA测试函数有效。非常感谢。不是python的答案,但由于您使用的是unix ish系统,因此可能有可用的
split
@AnthonyKong会根据大小自动分割文件,并确保不会丢失数据?@soohoo原始文件的大小是多少?你的系统内存是多少?您能测试我在答案中发布的函数并告诉我们它是否有效吗?因为您也可以逐行读取文件,速度会慢很多,但不会崩溃。@ThaerA测试函数有效。非常感谢。