Python 巨蟒靓汤刮，新蛋_Python_Web Scraping_Beautifulsoup

Python 巨蟒靓汤刮，新蛋

python web-scraping

Python 巨蟒靓汤刮，新蛋,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我是python的新手，我想我应该试着学习制作一个web scrape。所以我试图在Newegg网站上搜索图形卡，但似乎遇到了一些错误。我所要做的就是获取数据并将其导入到我可以查看的cvs文件中。但是，如果我评论说我又犯了一个错误，我似乎无法理解。感谢您的帮助文件“webScrape.py”，第32行，在 price=price_容器[0]。text.strip（“|”）索引器：列表索引超出范围 # import beautiful soup 4 and use urllib to impo

我是python的新手，我想我应该试着学习制作一个web scrape。所以我试图在Newegg网站上搜索图形卡，但似乎遇到了一些错误。我所要做的就是获取数据并将其导入到我可以查看的cvs文件中。但是，如果我评论说我又犯了一个错误，我似乎无法理解。感谢您的帮助

文件“webScrape.py”，第32行，在 price=price_容器[0]。text.strip（“|”）索引器：列表索引超出范围

# import beautiful soup 4 and use urllib to import urlopen
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

# url where we will grab the product data
my_url = 'https://www.newegg.com/Product/ProductList.aspxSubmit=ENE&DEPA=0&Order=BESTMATCH&Description=graphics+card&ignorear=0&N=-1&isNodeId=1'

# open connection and grab the URL page information, read it, then close it
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# parse html from the page
page_soup = soup(page_html, "html.parser")

# find each product within the item-container class
containers = page_soup.findAll("div",{"class":"item-container"})

# write a file named products.csv with the data returned
filename = "products.csv"
f = open(filename, "w")

# create headers for products
headers = "price, product_name, shipping\n"

f.write("")

# define containers based on location on webpage and their DOM elements
for container in containers:
       price_container = container.findAll("li", {"class":"price-current"})
       price = price_container[0].text.strip("|")

       title_container = container.findAll("a", {"class":"item-title"})
       product_name = title_container[0].text

       shipping_container = container.findAll("li",{"class":"price-ship"})
       shipping = shipping_container[0].text.strip()

        f.write(price + "," + product_name.replace(",", "|") + "," + shipping + "\n")

f.close()

您可以写入数据帧，这很容易导出到csv。我在

标题中添加了一个额外的类选择器.list wrap
，以确保所有列表的长度相同
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

def main():

    url = 'https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description=+graphics+cards&N=-1&isNodeId=1'
    res = requests.get(url)
    soup = BeautifulSoup(res.content, "lxml")
    prices = soup.select('.price-current')
    titles = soup.select('.list-wrap .item-title')
    shipping = soup.select('.price-ship')   
    items = list(zip(titles,prices, shipping))   
    results = [[title.text.strip(),re.search('\$\d+.\d+',price.text.strip()).group(0),ship.text.strip()] for title, price,ship in items]

    df = pd.DataFrame(results,columns=['title', 'current price', 'shipping cost'])
    df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8',index = False )

if __name__ == "__main__":
    main()

非常感谢。很好，不管怎么说，可以循环多个页面？很抱歉刚刚看到这个。在页面上循环时，将上述函数稍微重写为要调用的函数。今天晚些时候我会看一看。