Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 将目录中的多个文件转换为.txt格式。但是文件名变成了二进制_Python 3.x_Directory_Nlp_Operating System - Fatal编程技术网

Python 3.x 将目录中的多个文件转换为.txt格式。但是文件名变成了二进制

Python 3.x 将目录中的多个文件转换为.txt格式。但是文件名变成了二进制,python-3.x,directory,nlp,operating-system,Python 3.x,Directory,Nlp,Operating System,因此,我正在创建剽窃软件,为此,我需要将.pdf、.docx、[enter image description here][1]等文件转换为.txt格式。我成功地找到了将一个目录中的所有文件转换为另一个目录的方法。但问题是,这种方法正在更改文件名 转换为二进制值。我需要获得下一阶段需要的原始文件名 **Code:** import os import uuid import textract source_directory = os.path.join(os.getcwd(), "

因此,我正在创建剽窃软件,为此,我需要将.pdf、.docx、[enter image description here][1]等文件转换为.txt格式。我成功地找到了将一个目录中的所有文件转换为另一个目录的方法。但问题是,这种方法正在更改文件名

转换为二进制值。我需要获得下一阶段需要的原始文件名

**Code:**
import os
import uuid
import textract
source_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/mainfolder")

for filename in os.listdir(source_directory):
    file, extension = os.path.splitext(filename)
    unique_filename = str(uuid.uuid4()) + extension
    os.rename(os.path.join(source_directory,  filename), os.path.join(source_directory, unique_filename))

training_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/trainingdata")

for process_file in os.listdir(source_directory):
    file, extension = os.path.splitext(process_file)

    # We create a new text file name by concatenating the .txt extension to file UUID
    dest_file_path = file + '.txt'

    # extract text from the file
    content = textract.process(os.path.join(source_directory, process_file))

    # We create and open the new and we prepare to write the Binary Data which is represented by the wb - Write Binary
    write_text_file = open(os.path.join(training_directory, dest_file_path), "wb")

    # write the content and close the newly created file
    write_text_file.write(content)
    write_text_file.close()


删除重命名文件的此行:

os.rename(os.path.join(source_directory,  filename), os.path.join(source_directory, unique_filename))
这也不是二进制的,而是一个


Cheers

删除重命名文件的这一行:

os.rename(os.path.join(source_directory,  filename), os.path.join(source_directory, unique_filename))
这也不是二进制的,而是一个

干杯