Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
将新闻写入CSV文件(Python 3,BeautifulSoup)_Python_Python 3.x_Csv_Beautifulsoup - Fatal编程技术网

将新闻写入CSV文件(Python 3,BeautifulSoup)

将新闻写入CSV文件(Python 3,BeautifulSoup),python,python-3.x,csv,beautifulsoup,Python,Python 3.x,Csv,Beautifulsoup,我希望Python3.6将以下代码的输出写入csv。如果它是这样的话,那就太好了:每篇文章一行(它是a),四列分别是“标题”、“URL”、“类别”[#Politik等]、“PublishedAt” 对于写入csv,我已经有了这个 with open('%s_schlagzeilen.csv' % datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S.%f'), 'w', newline='', encoding='ut

我希望Python3.6将以下代码的输出写入csv。如果它是这样的话,那就太好了:每篇文章一行(它是a),四列分别是“标题”、“URL”、“类别”[#Politik等]、“PublishedAt”

对于写入csv,我已经有了这个

with open('%s_schlagzeilen.csv' % datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S.%f'), 'w', newline='',
              encoding='utf-8') as file:
        w = csv.writer(file, delimiter="|")
        w.writerow([...])

…需要知道下一步要做什么。谢谢!!提前

您可以将所有需要提取的字段收集到字典列表中,并使用写入CSV文件:

import csv
import datetime

from bs4 import BeautifulSoup
import requests


website = 'http://spiegel.de/schlagzeilen'
r = requests.get(website)
soup = BeautifulSoup((r.content), "lxml")

articles = []
for a in soup.select(".schlagzeilen-content.schlagzeilen-overview a[title]"):
    category, published_at = a.find_next_sibling(class_="headline-date").get_text().split(",")

    articles.append({
        "Title": a.get_text(),
        "URL": a.get('href'),
        "Category": category.strip(" ()"),
        "PublishedAt": published_at.strip(" ()")
    })

filename = '%s_schlagzeilen.csv' % datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S.%f')
with open(filename, 'w', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=["Title", "URL", "Category", "PublishedAt"], )

    writer.writeheader()
    writer.writerows(articles)
注意我们是如何定位类别和“published at”-我们需要转到下一个同级元素并用逗号分割文本,去掉额外的括号

import csv
import datetime

from bs4 import BeautifulSoup
import requests


website = 'http://spiegel.de/schlagzeilen'
r = requests.get(website)
soup = BeautifulSoup((r.content), "lxml")

articles = []
for a in soup.select(".schlagzeilen-content.schlagzeilen-overview a[title]"):
    category, published_at = a.find_next_sibling(class_="headline-date").get_text().split(",")

    articles.append({
        "Title": a.get_text(),
        "URL": a.get('href'),
        "Category": category.strip(" ()"),
        "PublishedAt": published_at.strip(" ()")
    })

filename = '%s_schlagzeilen.csv' % datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S.%f')
with open(filename, 'w', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=["Title", "URL", "Category", "PublishedAt"], )

    writer.writeheader()
    writer.writerows(articles)