为python目录中的每个.pdf文件创建一个新的.txt文件_Python_Loops_File Io_Directory_File Handling

为python目录中的每个.pdf文件创建一个新的.txt文件

python loops file-io directory

为python目录中的每个.pdf文件创建一个新的.txt文件,python,loops,file-io,directory,file-handling,Python,Loops,File Io,Directory,File Handling,我的代码应该从一个目录中获取每个pdf，对其进行OCR，并为每个OCR的pdf返回一个.txt文件。pdf和.txt文件的名称应相同，只是.pdf改为.txt。我被困在分割输入pdf名称的部分，为OCR文件生成扩展名为.txt的相同名称。目录中的示例文件如下所示：“000dbf9d-d53f-465f-a7ce-72272136FB7465.pdf”。我需要输出为“000dbf9d-d53f-465f-a7ce-722722136fb7465.txt”。此外，我的代码不会创建新的.txt文件，而

我的代码应该从一个目录中获取每个pdf，对其进行OCR，并为每个OCR的pdf返回一个.txt文件。pdf和.txt文件的名称应相同，只是.pdf改为.txt。我被困在分割输入pdf名称的部分，为OCR文件生成扩展名为.txt的相同名称。目录中的示例文件如下所示：“000dbf9d-d53f-465f-a7ce-72272136FB7465.pdf”。我需要输出为“000dbf9d-d53f-465f-a7ce-722722136fb7465.txt”。此外，我的代码不会创建新的.txt文件，而是为每次迭代覆盖一个文件。我需要一个新的.txt文件为每个OCR'd.pdf文件。迄今为止的代码：

import io
import glob
from PIL import Image
import pytesseract
from wand.image import Image as wi


files = glob.glob(r"D:\files\**")
for file in files:
    #print(file)
    pdf = wi(filename = file, resolution = 300)

    pdfImg = pdf.convert('jpeg')

    imgBlobs = []

    for img in pdfImg.sequence:
        page = wi(image = img)
        imgBlobs.append(page.make_blob('jpeg'))

    extracted_texts = []

    for imgBlob in imgBlobs:
            im = Image.open(io.BytesIO(imgBlob))
            text = pytesseract.image_to_string(im, lang = 'eng')
            extracted_texts.append(text)          
    with open("D:\\extracted_text\\"+ "\\file1.txt", 'w') as f:
        f.write(str(extracted_texts))

您只需跟踪文件名，并在最后两行中重复使用：

# ...
import os


files = glob.glob(r"D:\files\**")
for file in files:
    #print(file)

    # Get the name of the file less any suffixes
    name = os.path.basename(file).split('.')[0]

    # ...

    # Use `name` from above to name your text file         
    with open("D:\\extracted_text\\" + name + ".txt", 'w') as f:
        f.write(str(extracted_texts))

您只需跟踪文件名，并在最后两行中重复使用：

# ...
import os


files = glob.glob(r"D:\files\**")
for file in files:
    #print(file)

    # Get the name of the file less any suffixes
    name = os.path.basename(file).split('.')[0]

    # ...

    # Use `name` from above to name your text file         
    with open("D:\\extracted_text\\" + name + ".txt", 'w') as f:
        f.write(str(extracted_texts))