Python 使用csv.writer和一系列字符串时出现行和列输出问题_Python_String_Csv_Pdf_Export To Csv

Python 使用csv.writer和一系列字符串时出现行和列输出问题

python string csv pdf

Python 使用csv.writer和一系列字符串时出现行和列输出问题,python,string,csv,pdf,export-to-csv,Python,String,Csv,Pdf,Export To Csv,我有一组PDF，我正试图从中提取数据进行分析。作为此过程的一部分，我希望修改此数据并将其导出到.csv文件中。到目前为止，我已经能够成功地从我的PDF中提取我的数据这部分数据是一组字符串，看起来像： Deer W Pre 4-3F Deer W Post 2-1F DG Post 7F S Pre 2-12F Staff Post 3-1F Staff Pre 2-10F Staff Post 2-11F Tut Post 2-1F 我试图使用csv.writer将这一系列字

我有一组PDF，我正试图从中提取数据进行分析。作为此过程的一部分，我希望修改此数据并将其导出到.csv文件中。到目前为止，我已经能够成功地从我的PDF中提取我的数据

这部分数据是一组字符串，看起来像：

 Deer W Pre 4-3F
 Deer W Post 2-1F
 DG Post 7F
 S Pre 2-12F
 Staff Post 3-1F
 Staff Pre 2-10F
 Staff Post 2-11F
 Tut Post 2-1F

我试图使用csv.writer将这一系列字符串写入.csv文件，所有字符串都在同一列中，但每个字符串都在各自的行中。我在这里做了很多挖掘工作，但没有找到解决我问题的办法。我使用的代码是：

        with open("output.csv", mode="a+") as fp:
            wr = csv.writer(fp, dialect="excel")
            for item in site_tree_info: #site_tree_info is the variable that stores the strings
               wr.writerow([str(item)])

这给了我一个相当奇怪的输出：

对于如何获得我的预期产出，员工是否有建议：

我真的不明白为什么[str（string）]在这里不适用于我，因为它适用于许多其他有类似问题的人

这是我用来创建上面列出的字符串的代码：

# Get list of output pdf files in our directory

meta_sample = re.compile(r'^[A-Z].*') #this is to pull text from page 1

for root, dirs, files in os.walk('/Users/myname/tree'):
    for filename in files:
        p = os.path.join(root, filename)
        #print(p)
        with pdfplumber.open(p) as pdf:
            #pull text from the first page of pdfs which includes information about the samples and the conditions they were analyzed under
            sample_info = pdf.pages[0]
            sample_info_text = sample_info.extract_text()   
            sample_info_text_split = sample_info_text.split('\n')

        
        for lines in sample_info_text_split: 
            if meta_sample.match(lines):
                column_name, *column_info = lines.split(':')
                column_info = ' '.join(column_info)
                #print(column_info) #we have accurately captured both left and right sides of the table from page 1
        
        #This prints Sample ID and sample site/tree info, which is the 2nd item [2] in the sample_info_text_split string
        #We then strip the string of the ":" and split the string into two at that point. I then grab the 2nd item in this split string [1] which prints the site and tree info
        site_tree_info = sample_info_text_split[2].strip().split(":", 1)[1]
        print(site_tree_info) #this prints as above

简单的解释是你的

site\u tree\u info

变量是一个

str

所以当你在它上面循环时，它会为每个字符创建新行，所以我建议你不要像这样对

site\u tree\u info

使用字符串

list

（我假设数据是这样的）

我通过创建一个新变量site_tree_info_list=[site_tree_info]，按照您的建议，将site_tree_info列为一个列表，这样做很有效。

site_tree_info  = ['Deer W Pre 4-3F','Deer W Post 2-1F']