Python 将刮取的结果集保存到CSV文件中_Python_Csv_Parsing_Screen Scraping

Python 将刮取的结果集保存到CSV文件中

python csv parsing

Python 将刮取的结果集保存到CSV文件中,python,csv,parsing,screen-scraping,Python,Csv,Parsing,Screen Scraping,我编写了一个小脚本，它获取ebay结果集，并将每个字段存储在不同的变量中：链接、价格、出价如何获取变量并将每个拍卖项目的每个结果保存到CSV文件中，其中每行代表不同的拍卖项目例：链接、价格、出价以下是我目前的代码： import requests, bs4 import csv import requests import pandas as pd res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH

我编写了一个小脚本，它获取ebay结果集，并将每个字段存储在不同的变量中：链接、价格、出价

如何获取变量并将每个拍卖项目的每个结果保存到CSV文件中，其中每行代表不同的拍卖项目

例：链接、价格、出价

以下是我目前的代码：

import requests, bs4
import csv
import requests
import pandas as pd
res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text)

# grabs the link, selling price, and # of bids from historical auctions
links = soup.find_all(class_="vip")
prices = soup.find_all("span", "bold bidsold")
bids = soup.find_all("li", "lvformat")

这应该可以做到：

import csv
import requests
import bs4

res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)

# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]

# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]

# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]

# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]

# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
    w = csv.writer(csvfile)
    for e in l:
        w.writerow(e)

因此，您将得到一个csv文件，该文件作为

，

（逗号）作为分隔符。

首先，您应该考虑如何提取所需的数据，因为例如

投标

包含的投标数量多于投标数量。@albert您是指投标周围的html文本吗？是，以及存储在

链接

和

价格

中的数据周围的所有其他HTML元素。熊猫需要什么？为什么要假设安装了

lxml

解析器？我只是更喜欢使用numpy和pandas。在这个例子中，Pandas允许我简单地将数据帧写入cvs。你绝对是对的。我也喜欢那些但是，这些都提供了一个很大的开销，在这种情况下是不需要的。我也喜欢这个挑战，因为我也在尝试学习汤。谢谢Albert的例子如果我解决了你的问题，请勾选我的答案，将您的问题标记为已解决。关于如何解析列表标题的现有数据，您有任何提示吗？您的意思是作为列标题吗？是的，作为单独的列，我想放置列表标题

import csv
import requests
import bs4

res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)

# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]

# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]

# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]

# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]

# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
    w = csv.writer(csvfile)
    for e in l:
        w.writerow(e)