Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/csharp-4.0/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
我已经使用anaconda python3将pdf文件转换为csv,但转换后的csv文件不可读,如何使其可读?_Python_Pandas_Csv_Pypdf2 - Fatal编程技术网

我已经使用anaconda python3将pdf文件转换为csv,但转换后的csv文件不可读,如何使其可读?

我已经使用anaconda python3将pdf文件转换为csv,但转换后的csv文件不可读,如何使其可读?,python,pandas,csv,pypdf2,Python,Pandas,Csv,Pypdf2,我已经使用anaconda python 3将pdf文件转换为csv。但是转换的csv文件不是可读的形式。如何将该csv设置为可读格式?我测试了您的方法,但找不到纠正csv输出的方法。我通常这样做: # importing required modules import PyPDF2 # creating a pdf file object pdfFileObj = open(path, 'rb') # creating a pdf reader object pdfReader

我已经使用anaconda python 3将pdf文件转换为csv。但是转换的csv文件不是可读的形式。如何将该csv设置为可读格式?

我测试了您的方法,但找不到纠正csv输出的方法。我通常这样做:

# importing required modules 
import PyPDF2 

# creating a pdf file object 
pdfFileObj = open(path, 'rb') 

# creating a pdf reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

# printing number of pages in pdf file 
print(pdfReader.numPages) 

# creating a page object 
pageObj = pdfReader.getPage(0) 

# extracting text from page 
print(pageObj.extractText()) 
  
df = pd.DataFrame(pdfFileObj)
print (df)
df.to_csv('output.csv')
导入csv
导入操作系统
从miner_text_generator导入提取_text_by_页面
def导出为csv(pdf路径,csv路径):
filename=os.path.splitext(os.path.basename(pdf_path))[0]
计数器=1
打开(csv_路径,'w')作为csv_文件:
writer=csv.writer(csv\u文件)
对于按页面提取文本的页面(pdf路径):
text=第[0:100]页
words=text.split()
writer.writerow(单词)
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu':
pdf_path='.pdf'
csv_路径='.csv'
导出为csv(pdf路径、csv路径)

\u但转换后的csv文件格式不可读。\u这具体是什么意思?请提供a,以及当前和预期输出。
import csv
import os
from miner_text_generator import extract_text_by_page
def export_as_csv(pdf_path, csv_path):
    filename = os.path.splitext(os.path.basename(pdf_path))[0]
    
    counter = 1
    with open(csv_path, 'w') as csv_file:
        writer = csv.writer(csv_file)
        for page in extract_text_by_page(pdf_path):
            text = page[0:100]
            words = text.split()
            writer.writerow(words)
            
        
if __name__ == '__main__':
    pdf_path = '<your path to the file>.pdf'
    csv_path = '<path to the output>.csv'
    export_as_csv(pdf_path, csv_path)