Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python web scraper中的错误无法正常运行_Python_Web - Fatal编程技术网

python web scraper中的错误无法正常运行

python web scraper中的错误无法正常运行,python,web,Python,Web,在我运行这个程序后,它会给我以下错误 回溯(最近一次调用last):文件“my_first_websraper.py”, 第18行,在 brand=container[0].img[“title”].title()文件“C:\Users\MyUserName\AppData\Local\Programs\Python\Python38-32\lib\site packages\bs4\element.py”, 第1368行,在getitem 返回self.attrs[key]键错误:0 当他在教

在我运行这个程序后,它会给我以下错误

回溯(最近一次调用last):文件“my_first_websraper.py”, 第18行,在 brand=container[0].img[“title”].title()文件“C:\Users\MyUserName\AppData\Local\Programs\Python\Python38-32\lib\site packages\bs4\element.py”, 第1368行,在getitem 返回self.attrs[key]键错误:0

当他在教程中运行它时,它不仅正确地列出了所有内容,而且以同样的方式列出了网站上的所有内容。关于如何解决这个问题有什么想法吗


关于它应该是什么样子的想法,这段视频的时间是28:55:

如果你在YouTube视频上向下滚动到最上面的评论,作者解释了这个问题

它看起来不像
容器。div
将为您提供
项目信息
类中的div,而是
项目徽章
类中的div。这是因为后者发生在前者之前。当您使用点(
)操作符访问任何标记时,它只会返回该标记的第一个实例,这里就是这样

要解决此问题,请使用
find()
方法查找包含所需信息的确切div


示例:
divWithInfo=containers[0]。查找(“div”,“item info”)

我知道这不是使用与您相同的软件包,甚至不是使用与您相近的代码,但我能够使用selenium获得每个项目及其价格!我与其他库有过一些问题,因为它们只获取html内容,不能使用无头浏览器(通常)。这会导致呈现的网页出现问题,因为它们在呈现所有产品之前获取页面

我在页面上通过以下脚本获得了价格:

编辑:添加排序

编辑:添加excel输出和数字格式

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20cards'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

#grabs each product
containers =  page_soup.findAll("div", {"class":"item-container"})

for container in containers:
    brand = container[0].img["title"].title()

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].txt


    shipping_container = container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()


    print("Brand: "+ brand)
    print("product name: "+ product_name)
    print("shipping: "+ shipping)
输出:

url = "https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20cards"

driver.get(url)

# let the page load
time.sleep(5)

get_price = lambda x: x.text.split(' ')[0].replace('$', '').replace('Free', '0')

# get all the prices of the products on the page
prices = [{'product': item.find_element_by_class_name('item-title').text,
           'price': get_price(item.find_element_by_class_name('price-current')),
           'shipping': get_price(item.find_element_by_class_name('price-ship'))}
          for item in driver.find_elements_by_class_name('item-info')]

prices_sorted = sorted(prices, key=lambda x: x['price'])

# prettify the output with json
import json
print(json.dumps(prices_sorted, indent=4))


# -------------- export to excel --------------
from openpyxl import Workbook

 # create the workbook
wb = Workbook()

# select the first sheet
ws = wb.active
# write the header row
ws.append([key for key in prices_sorted[0].keys()])
for row in prices_sorted:
    # write each row
    ws.append([value for value in row.values()])

path = './prices.xlsx'
# save the file
wb.save(filename = path)
Excel输出:


这里有一个指向Colab表的链接,您可以自己运行它:

我如何调整它以获得其他信息?另外,我如何让它打印页面上的每个项目?不仅仅是第一个。就实现而言,将`brand=container[0].img[“title”].title()`替换为该示例的编辑版本是可行的。对于每个项目,只需遵循与答案相同的语法,但将其替换为变量、属性等。因此它应该是这样的:brand=container[0]。img[“title”].title()title_container=container.find(“a”,{“class”:“item title”})product_name=title_container[0]。txt shipping_container=container.find(“li”,{“class”:“price ship”})shipping=shipping_container[0]。text.strip()让我们来看看。有没有办法组织Json文件,使其以最优惠的方式组织?当然!请参阅添加的行:
prices\u sorted=sorted(prices,key=lambda x:x['price'])
如果这是您想要的,请标记为解决方案2最后一件事是否有办法将其制作为电子表格而不是JSON?还有一种方法可以合并来自网站的客户评论吗?我添加了导出到excel,但也可以执行csv
[
    {
        "product": "GIGABYTE Radeon RX 570 DirectX 12 GV-RX570GAMING-4GD REV2.0 Video Card",
        "price": "$119.99",
        "shipping": "Free"
    },
    {
        "product": "ASRock Phantom Gaming D Radeon RX 570 DirectX 12 RX570 4G Video Card",
        "price": "$119.99",
        "shipping": "Free"
    },
    {
        "product": "MSI Radeon RX 570 DirectX 12 RX 570 8GT OC Video Card",
        "price": "$135.99",
        "shipping": "Free"
    },
    {
        "product": "XFX Radeon RX 580 DirectX 12 RX-580P8RFD6 Video Card",
        "price": "$189.99",
        "shipping": "$5.99"
    },
    {
        "product": "MSI GeForce GTX 1660 SUPER DirectX 12 GTX 1660 SUPER VENTUS XS OC Video Card",
        "price": "$249.99",
        "shipping": "Free"
    },
    {
        "product": "SAPPHIRE PULSE Radeon RX 5600 XT DirectX 12 100419P6GL Video Card",
        "price": "$289.99",
        "shipping": "$3.99"
    },
    {
        "product": "EVGA GeForce GTX 1660 Ti SC ULTRA GAMING, 06G-P4-1667-KR, 6GB GDDR6, Dual Fan, Metal Backplate",
        "price": "$299.99",
        "shipping": "Free"
    },
    {
        "product": "EVGA GeForce RTX 2060 KO ULTRA GAMING Video Card, 06G-P4-2068-KR, 6GB GDDR6, Dual Fans, Metal Backplate",
        "price": "$319.99",
        "shipping": "Free"
    },
    {
        "product": "MSI GeForce RTX 2060 DirectX 12 RTX 2060 VENTUS XS 6G OC Video Card",
        "price": "$339.99",
        "shipping": "Free"
    },
    {
        "product": "ASUS GeForce RTX 2060 Overclocked 6G GDDR6 Dual-Fan EVO Edition Graphics Card (DUAL-RTX2060-O6G-EVO)",
        "price": "$349.99",
        "shipping": "Free"
    },
    {
        "product": "ASUS ROG Strix Radeon RX 5700 XT ROG-STRIX-RX5700XT-O8G-GAMING Video Card",
        "price": "$459.99",
        "shipping": "Free"
    },
    {
        "product": "GIGABYTE GeForce RTX 2070 Super WINDFORCE OC 3X 8G Graphics Card, GV-N207SWF3OC-8GD",
        "price": "$499.99",
        "shipping": "Free"
    }
]