Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行_Python_Html_Csv_Beautifulsoup_Html Parsing

Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行

python html csv

Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行,python,html,csv,beautifulsoup,html-parsing,Python,Html,Csv,Beautifulsoup,Html Parsing,我正在尝试将网页抓取的结果写入CSV文件。我已成功地将输出写入CSV，但它以行而不是列的形式输入。以下是脚本： import bs4 import requests import csv #get webpage for Apple inc. September income statement page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL") #put into beautiful soup soup = bs4

我正在尝试将网页抓取的结果写入CSV文件。我已成功地将输出写入CSV，但它以行而不是列的形式输入。以下是脚本：

import bs4
import requests
import csv

#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")

#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)

#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")

#creates generator that holds four values that are yearly revenues for company
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings

#iterates through generator from above and writes output to CSV file
for value in revenue:
    value = value.get_text(strip=True)
    with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
            s = csv.writer(csvfile)
            s.writerow([data.encode("utf-8") for data in [value]])

我知道Python中有一个

zip（）

函数可能有用，但我还没有弄清楚如何将它应用于这种情况

感谢您的帮助。

如果您有正确的想法，zip可以轻松帮助您：

#creates generator that holds four values that are yearly revenues for company
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings

revenue = zip(*revenue) # <------ yes, it is that easy

#iterates through generator from above and writes output to CSV file
for value in revenue:
    value = value.get_text(strip=True)
         ...

#创建包含四个值的生成器，这四个值是公司的年收入
收入=table.tr.td.table.tr.next_sibling.td.next_sibling
revenue=zip（*revenue）#您只需打开文件一次，然后只需调用writerow（）
一次：
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([value.get_text(strip=True).encode("utf-8") for value in revenue])

产生：
"37,432,000","45,646,000","57,594,000","37,472,000"

27/09/2014,28/06/2014,29/03/2014,28/12/2013
"42,123,000","37,432,000","45,646,000","57,594,000"


改进答案的好处是：您还可以解析表头并将其作为csv头写入：
headers = table.find('tr', class_="yfnc_modtitle1").find_all('th')
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings

with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
    writer.writerow([value.get_text(strip=True).encode("utf-8") for value in revenue])

产生：
"37,432,000","45,646,000","57,594,000","37,472,000"

27/09/2014,28/06/2014,29/03/2014,28/12/2013
"42,123,000","37,432,000","45,646,000","57,594,000"

这太棒了，谢谢。还有什么方法可以去掉引号吗？@Kane谢谢，这是因为值中有逗号。您会选择哪个选项：用点替换逗号，或者使用不同的分隔符，如
？逗号很好，我指的是引号”
ones@Kane是的，我明白。我想说的是，默认情况下，CSV
分隔符是逗号，值中有逗号。为了用分隔符逗号区分值中的逗号，它将值放在引号中。哦，我明白了，我想|
可以。|
符号是否在CSV输出上可见？像这样：| 42123000 |
？