Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行_Python_Html_Csv_Beautifulsoup_Html Parsing - Fatal编程技术网

Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行

Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行,python,html,csv,beautifulsoup,html-parsing,Python,Html,Csv,Beautifulsoup,Html Parsing,我正在尝试将网页抓取的结果写入CSV文件。我已成功地将输出写入CSV,但它以行而不是列的形式输入。以下是脚本: import bs4 import requests import csv #get webpage for Apple inc. September income statement page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL") #put into beautiful soup soup = bs4

我正在尝试将网页抓取的结果写入CSV文件。我已成功地将输出写入CSV,但它以行而不是列的形式输入。以下是脚本:

import bs4
import requests
import csv

#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")

#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)

#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")

#creates generator that holds four values that are yearly revenues for company
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings

#iterates through generator from above and writes output to CSV file
for value in revenue:
    value = value.get_text(strip=True)
    with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
            s = csv.writer(csvfile)
            s.writerow([data.encode("utf-8") for data in [value]])
我知道Python中有一个
zip()
函数可能有用,但我还没有弄清楚如何将它应用于这种情况


感谢您的帮助。

如果您有正确的想法,zip可以轻松帮助您:

#creates generator that holds four values that are yearly revenues for company
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings

revenue = zip(*revenue) # <------ yes, it is that easy

#iterates through generator from above and writes output to CSV file
for value in revenue:
    value = value.get_text(strip=True)
         ...
#创建包含四个值的生成器,这四个值是公司的年收入
收入=table.tr.td.table.tr.next_sibling.td.next_sibling

revenue=zip(*revenue)#您只需打开文件一次,然后只需调用
writerow()
一次:

with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([value.get_text(strip=True).encode("utf-8") for value in revenue])
产生:

"37,432,000","45,646,000","57,594,000","37,472,000"
27/09/2014,28/06/2014,29/03/2014,28/12/2013
"42,123,000","37,432,000","45,646,000","57,594,000"

改进答案的好处是:您还可以解析表头并将其作为csv头写入:

headers = table.find('tr', class_="yfnc_modtitle1").find_all('th')
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings

with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
    writer.writerow([value.get_text(strip=True).encode("utf-8") for value in revenue])
产生:

"37,432,000","45,646,000","57,594,000","37,472,000"
27/09/2014,28/06/2014,29/03/2014,28/12/2013
"42,123,000","37,432,000","45,646,000","57,594,000"

这太棒了,谢谢。还有什么方法可以去掉引号吗?@Kane谢谢,这是因为值中有逗号。您会选择哪个选项:用点替换逗号,或者使用不同的分隔符,如
?逗号很好,我指的是引号
ones@Kane是的,我明白。我想说的是,默认情况下,
CSV
分隔符是逗号,值中有逗号。为了用分隔符逗号区分值中的逗号,它将值放在引号中。哦,我明白了,我想
|
可以。
|
符号是否在CSV输出上可见?像这样:
| 42123000 |