Python 将刮取的结果集保存到CSV文件中

Python 将刮取的结果集保存到CSV文件中,python,csv,parsing,screen-scraping,Python,Csv,Parsing,Screen Scraping,我编写了一个小脚本,它获取ebay结果集,并将每个字段存储在不同的变量中:链接、价格、出价 如何获取变量并将每个拍卖项目的每个结果保存到CSV文件中,其中每行代表不同的拍卖项目 例:链接、价格、出价 以下是我目前的代码: import requests, bs4 import csv import requests import pandas as pd res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH

我编写了一个小脚本,它获取ebay结果集,并将每个字段存储在不同的变量中:链接、价格、出价

如何获取变量并将每个拍卖项目的每个结果保存到CSV文件中,其中每行代表不同的拍卖项目

例:链接、价格、出价

以下是我目前的代码:

import requests, bs4
import csv
import requests
import pandas as pd
res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text)

# grabs the link, selling price, and # of bids from historical auctions
links = soup.find_all(class_="vip")
prices = soup.find_all("span", "bold bidsold")
bids = soup.find_all("li", "lvformat")

这应该可以做到:

import csv
import requests
import bs4

res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)

# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]

# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]

# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]

# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]

# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
    w = csv.writer(csvfile)
    for e in l:
        w.writerow(e)

因此,您将得到一个csv文件,该文件作为
(逗号)作为分隔符。

首先,您应该考虑如何提取所需的数据,因为例如
投标
包含的投标数量多于投标数量。@albert您是指投标周围的html文本吗?是,以及存储在
链接
价格
中的数据周围的所有其他HTML元素。熊猫需要什么?为什么要假设安装了
lxml
解析器?我只是更喜欢使用numpy和pandas。在这个例子中,Pandas允许我简单地将数据帧写入cvs。你绝对是对的。我也喜欢那些但是,这些都提供了一个很大的开销,在这种情况下是不需要的。我也喜欢这个挑战,因为我也在尝试学习汤。谢谢Albert的例子如果我解决了你的问题,请勾选我的答案,将您的问题标记为已解决。关于如何解析列表标题的现有数据,您有任何提示吗?您的意思是作为列标题吗?是的,作为单独的列,我想放置列表标题
import csv
import requests
import bs4

res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)

# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]

# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]

# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]

# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]

# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
    w = csv.writer(csvfile)
    for e in l:
        w.writerow(e)