Python 如何修复写入csv文件的问题_Python_Csv_Beautifulsoup_Export To Csv

Python 如何修复写入csv文件的问题

python csv

Python 如何修复写入csv文件的问题,python,csv,beautifulsoup,export-to-csv,Python,Csv,Beautifulsoup,Export To Csv,我希望我的程序将文章日期、标题和正文文本写入csv文件。当我在控制台中打印正文时，它会打印所有内容，但是在csv文件中，它只打印文章的最后一行 CSV结果：控制台打印：我试着将日期、标题和正文文本以单独的代码行作为列表写入行中，得到了相同的结果 from bs4 import BeautifulSoup from urllib.request import urlopen import csv csvfile = "C:/Users/katew/Dropbox/granularityg

我希望我的程序将文章日期、标题和正文文本写入csv文件。当我在控制台中打印正文时，它会打印所有内容，但是在csv文件中，它只打印文章的最后一行

CSV结果：

控制台打印：

我试着将日期、标题和正文文本以单独的代码行作为列表写入行中，得到了相同的结果

from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv

csvfile = "C:/Users/katew/Dropbox/granularitygrowth/Politico/pol.csv"
with open(csvfile, mode='w', newline='') as pol:
    csvwriter = csv.writer(pol, delimiter='|', quoting=csv.QUOTE_MINIMAL)
    csvwriter.writerow(["Date", "Title", "Article"])

    #for each page on Politico archive
    for p in range(0,1):
        url = urlopen("https://www.politico.com/newsletters/playbook/archive/%d" % p)
        content = url.read()

        #Parse article links from page
        soup = BeautifulSoup(content,"lxml")
        articleLinks = soup.findAll('article', attrs={'class':'story-frag format-l'})

        #Each article link on page
        for article in articleLinks:
            link = article.find('a', attrs={'target':'_top'}).get('href')

            #Open and read each article link
            articleURL = urlopen(link)
            articleContent = articleURL.read()

            #Parse body text from article page
            soupArticle = BeautifulSoup(articleContent, "lxml")

            #Limits to div class = story-text tag (where article text is)
            articleText = soupArticle.findAll('div', attrs={'class':'story-text'})
            for div in articleText:

                #Find date
                footer = div.find('footer', attrs={'class':'meta'})
                date = footer.find('time').get('datetime')
                print(date)

                #Find title
                headerSection = div.find('header')
                title = headerSection.find('h1').text
                print(title)

                bodyText = div.findAll('p')
                for p in bodyText:
                    p_string = str(p.text)
                    textContent = "" + p_string
                    print(textContent)

                #Adds data to csv file
                csvwriter.writerow([date, title, textContent])

我希望csv文件包含日期、标题和全文。

问题出在您的

for p in bodyText:

循环中。您正在将最后一个p的文本分配给

textContent

变量。尝试以下方法：

textContent = ""
bodyText = div.findAll('p')
for p in bodyText:
    p_string = str(p.text)
    textContent += p_string + ' '

print(textContent)
csvwriter.writerow([date, title, textContent])

问题出在您的

for p in bodyText:

循环中。您正在将最后一个p的文本分配给

textContent

变量。尝试以下方法：

textContent = ""
bodyText = div.findAll('p')
for p in bodyText:
    p_string = str(p.text)
    textContent += p_string + ' '

print(textContent)
csvwriter.writerow([date, title, textContent])

非常感谢。这正是我需要的答案。谢谢！这正是我需要的答案。