Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行
我正在尝试将网页抓取的结果写入CSV文件。我已成功地将输出写入CSV,但它以行而不是列的形式输入。以下是脚本:Python 如何在Beautiful Soup中将web scrape的输出写入列而不是行,python,html,csv,beautifulsoup,html-parsing,Python,Html,Csv,Beautifulsoup,Html Parsing,我正在尝试将网页抓取的结果写入CSV文件。我已成功地将输出写入CSV,但它以行而不是列的形式输入。以下是脚本: import bs4 import requests import csv #get webpage for Apple inc. September income statement page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL") #put into beautiful soup soup = bs4
import bs4
import requests
import csv
#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)
#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")
#creates generator that holds four values that are yearly revenues for company
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings
#iterates through generator from above and writes output to CSV file
for value in revenue:
value = value.get_text(strip=True)
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
s = csv.writer(csvfile)
s.writerow([data.encode("utf-8") for data in [value]])
我知道Python中有一个zip()
函数可能有用,但我还没有弄清楚如何将它应用于这种情况
感谢您的帮助。如果您有正确的想法,zip可以轻松帮助您:
#creates generator that holds four values that are yearly revenues for company
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings
revenue = zip(*revenue) # <------ yes, it is that easy
#iterates through generator from above and writes output to CSV file
for value in revenue:
value = value.get_text(strip=True)
...
#创建包含四个值的生成器,这四个值是公司的年收入
收入=table.tr.td.table.tr.next_sibling.td.next_sibling
revenue=zip(*revenue)#您只需打开文件一次,然后只需调用writerow()
一次:
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in revenue])
产生:
"37,432,000","45,646,000","57,594,000","37,472,000"
27/09/2014,28/06/2014,29/03/2014,28/12/2013
"42,123,000","37,432,000","45,646,000","57,594,000"
改进答案的好处是:您还可以解析表头并将其作为csv头写入:
headers = table.find('tr', class_="yfnc_modtitle1").find_all('th')
revenue = table.tr.td.table.tr.next_sibling.td.next_siblings
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in revenue])
产生:
"37,432,000","45,646,000","57,594,000","37,472,000"
27/09/2014,28/06/2014,29/03/2014,28/12/2013
"42,123,000","37,432,000","45,646,000","57,594,000"
这太棒了,谢谢。还有什么方法可以去掉引号吗?@Kane谢谢,这是因为值中有逗号。您会选择哪个选项:用点替换逗号,或者使用不同的分隔符,如
?逗号很好,我指的是引号”
ones@Kane是的,我明白。我想说的是,默认情况下,CSV
分隔符是逗号,值中有逗号。为了用分隔符逗号区分值中的逗号,它将值放在引号中。哦,我明白了,我想|
可以。|
符号是否在CSV输出上可见?像这样:| 42123000 |
?