Python 使用selenium的网页抓取ebay_Python_Html_Selenium

Python 使用selenium的网页抓取ebay

python html selenium

Python 使用selenium的网页抓取ebay,python,html,selenium,Python,Html,Selenium,嗨，我对编码非常陌生，我正在做一个项目，使用selenium从eBay上抓取数据我遇到了两个问题清单名称的XPath是 /*[@id=“srp-river-results-listing1”]/div/div[2]/a/h3 如何告诉python我想要每个列表，即listing>=1 对于清单价格的xpath /*[@id=“srp-river-results-listing1”]/div/div[2]/div[3]/div[1]/span 是我直接复制的，但当我将其输入python时，它表

嗨，我对编码非常陌生，我正在做一个项目，使用selenium从eBay上抓取数据

我遇到了两个问题

清单名称的XPath是

/*[@id=“srp-river-results-listing1”]/div/div[2]/a/h3

如何告诉python我想要每个列表，即

listing>=1

对于清单价格的xpath

/*[@id=“srp-river-results-listing1”]/div/div[2]/div[3]/div[1]/span

是我直接复制的，但当我将其输入python时，它表示存在无效语法，并指向

div[2]

，同样的情况也出现在清单的条件中。但是，对于名称、运输成本和原产国，它将起作用。如何避免修复这些问题

非常感谢，我不允许使用eBay API来搜索您可以使用的部分名称

contains（）

它对类很有用，因为

Selenium

中的

xpath

将所有类视为一个字符串，xpath

@class=“s-item\uu title”

将找不到

您还可以使用它按页面上显示的文本进行搜索

'//h3[contains(@text, "Raspberry")]`

2）。我不知道是什么问题-您没有显示完整的错误消息

顺便说一句：使用较短的xpath可以找到相同的项，而不是使用长xpath

'//h3[contains(@class, "s-item__title")]'

及

实际上，您可以尝试使用

跳过路径中的某些元素

'//li[contains(@id, "srp-river-results-listing")]//span'

而且它可能不需要

div[2]

有时，最好找到主标记，然后使用带有

的相对xpath只在该标记内搜索

将有关一个项目的信息分组可能很有用。有时项目可能没有某些值，然后使用

zip（所有标题、所有价格）

对信息进行分组可能会给出错误的结果

data = []

all_items = driver.find_elements_by_xpath('//li[@class="s-item   "]')

for item in all_items:
    # relative xpathes 

    title = item.find_element_by_xpath('.//h3[contains(@class, "s-item__title")]').text.strip()
    price = item.find_element_by_xpath('.//span[@class="s-item__price"]').text.strip()

    data.append( [title, price] )

示例代码

import selenium.webdriver

driver = selenium.webdriver.Firefox()
driver.get('https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2499334.m570.l1311.R1.TR12.TRC2.A0.H0.Xras.TRS0&_nkw=raspberry+pi+4&_sacat=0')

# --- separated elements ---

all_titles = []

all_items = driver.find_elements_by_xpath('//h3[@class="s-item__title"]')

for item in all_items:
    title = item.text.strip()
    print('title:', title)
    all_titles.append( title )

all_prices = []

all_items = driver.find_elements_by_xpath('//span[@class="s-item__price"]')

for item in all_items:
    price = item.text.strip()
    print('price:', price)
    all_prices.append( price )

data = list(zip(all_titles, all_prices))

all_others = []

all_items = driver.find_elements_by_xpath('//non_existing_xpath')

for item in all_items:
    other = item.text.strip()
    print('other:', other)
    all_others.append( other )

# all_others will be empty


# --- grouped ---

data = []

all_items = driver.find_elements_by_xpath('//li[@class="s-item   "]')

for item in all_items:
    title = item.find_element_by_xpath('.//h3[contains(@class, "s-item__title")]').text.strip()
    price = item.find_element_by_xpath('.//span[@class="s-item__price"]').text.strip()
    try:
        other = item.find_element_by_xpath('.//non_existing_xpath').text.strip()
    except Exception as ex:
        print('Exception:', ex)
        other = "" # default value when element doesn't exists.

    print('title:', title)
    print('other:', price)
    print('other:', other)
    print('---')
    data.append( [title, price, other])

# --- other examples ---

all_items = driver.find_elements_by_xpath('//li[contains(@id, "srp-river-results-listing")]//span[@class="s-item__price"]')
for item in all_items:
    print(item.text)

all_items = driver.find_elements_by_xpath('//li[contains(@id, "listing")]//span[@class="s-item__price"]')
for item in all_items:
    print(item.text)

all_items = driver.find_elements_by_xpath('//*[contains(@id, "srp-river-results-listing")]/div/div[2]/a/h3')
for item in all_items:
    print(item.text)

1）要按部分名称搜索，可以使用

contains（）

它对类很有用，因为

Selenium

中的

xpath

将所有类视为一个字符串，xpath

@class=“s-item\uu title”

将找不到

您还可以使用它按页面上显示的文本进行搜索

'//h3[contains(@text, "Raspberry")]`

2）。我不知道是什么问题-您没有显示完整的错误消息

顺便说一句：使用较短的xpath可以找到相同的项，而不是使用长xpath

'//h3[contains(@class, "s-item__title")]'

及

实际上，您可以尝试使用

跳过路径中的某些元素

'//li[contains(@id, "srp-river-results-listing")]//span'

而且它可能不需要

div[2]

有时，最好找到主标记，然后使用带有

的相对xpath只在该标记内搜索

将有关一个项目的信息分组可能很有用。有时项目可能没有某些值，然后使用

zip（所有标题、所有价格）

对信息进行分组可能会给出错误的结果

data = []

all_items = driver.find_elements_by_xpath('//li[@class="s-item   "]')

for item in all_items:
    # relative xpathes 

    title = item.find_element_by_xpath('.//h3[contains(@class, "s-item__title")]').text.strip()
    price = item.find_element_by_xpath('.//span[@class="s-item__price"]').text.strip()

    data.append( [title, price] )

示例代码

import selenium.webdriver

driver = selenium.webdriver.Firefox()
driver.get('https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2499334.m570.l1311.R1.TR12.TRC2.A0.H0.Xras.TRS0&_nkw=raspberry+pi+4&_sacat=0')

# --- separated elements ---

all_titles = []

all_items = driver.find_elements_by_xpath('//h3[@class="s-item__title"]')

for item in all_items:
    title = item.text.strip()
    print('title:', title)
    all_titles.append( title )

all_prices = []

all_items = driver.find_elements_by_xpath('//span[@class="s-item__price"]')

for item in all_items:
    price = item.text.strip()
    print('price:', price)
    all_prices.append( price )

data = list(zip(all_titles, all_prices))

all_others = []

all_items = driver.find_elements_by_xpath('//non_existing_xpath')

for item in all_items:
    other = item.text.strip()
    print('other:', other)
    all_others.append( other )

# all_others will be empty


# --- grouped ---

data = []

all_items = driver.find_elements_by_xpath('//li[@class="s-item   "]')

for item in all_items:
    title = item.find_element_by_xpath('.//h3[contains(@class, "s-item__title")]').text.strip()
    price = item.find_element_by_xpath('.//span[@class="s-item__price"]').text.strip()
    try:
        other = item.find_element_by_xpath('.//non_existing_xpath').text.strip()
    except Exception as ex:
        print('Exception:', ex)
        other = "" # default value when element doesn't exists.

    print('title:', title)
    print('other:', price)
    print('other:', other)
    print('---')
    data.append( [title, price, other])

# --- other examples ---

all_items = driver.find_elements_by_xpath('//li[contains(@id, "srp-river-results-listing")]//span[@class="s-item__price"]')
for item in all_items:
    print(item.text)

all_items = driver.find_elements_by_xpath('//li[contains(@id, "listing")]//span[@class="s-item__price"]')
for item in all_items:
    print(item.text)

all_items = driver.find_elements_by_xpath('//*[contains(@id, "srp-river-results-listing")]/div/div[2]/a/h3')
for item in all_items:
    print(item.text)

1.获取包含所有值的列表，然后使用Python循环从列表中仅获取一些元素。1<代码>/*[包含（@id，“srp river结果列表”）]？始终将完整的错误消息（从单词“Traceback”开始）作为文本（而不是屏幕截图）放在问题中（不是注释）。还有其他有用的信息。在你使用

[]

-ie.

（//*[@id=“srp-river-results-listing1”]/div/div）[2]/a/h3

我不知道你刮到了什么URL，但在ebay.com上，我可以用简单的

/h3[@class=“s-item\uu title”]

和价格

/span[@class=“s-item\uu price]

。我可以使用

//li[@class=“s-item”]

来代替

@id=“srp-river-results-listing1

列表中不同数字的

@id=“srp-results-listing1

我可以使用

//*[包含（@id，“srp-river-results-listing”）]

1.获取包含所有值的列表，然后使用Python循环从列表中仅获取一些元素。1.

/*[包含（@id，“srp-river-results-listing”）]

？始终（从单词“Traceback”开始）（不是注释）作为文本（不是截图）。还有其他有用的信息。2.在使用

[]]

-ie.

（/*[@id=“srp-river-results-listing1”]/div/div）[2]之前，可能需要将元素放入（）
/a/h3

我不知道你从ebay.com上搜到了什么URL，但我可以用简单的

/h3[@class=“s-item\uuu title”]

和价格

/span[@class=“s-item\uuu price”]

，而不是

@id=“srp-river-results-listing1

在

列表中使用不同的数字

非常感谢，这确实帮了大忙。特别是对于忽略赞助商项目的部分。但我只能对项目名称、运输成本、列表格式等这样做，这会导致数据FRME数量不相等。有没有办法修改它？某些产品可能没有一些标签，您可以获得不相等数量的我需要的项目如果您单独获取每个值，即仅图片，仅标题。这就是为什么我显示代码的第二部分-

分组

。首先它获取

s-item

，然后在

s-item

中搜索

price

和

title

，以便我可以对它们进行分组。如果某些产品中不存在eElement，则它不会跳过它，而是它将引发错误。我更改了示例代码。在第一部分中，它只搜索所有

'//不存在的xpath'

，并创建空列表。在第二部分中，它搜索单个

'.//不存在的xpath'

，在每个产品中，这样它会引发错误，使用

try/execpt

我可以捕获它并放置一些默认值（即空字符串）非常感谢，这实际上帮助了很多。特别是对于忽略赞助商项目的部分。但我只能对项目名称、运输成本、列表格式等这样做，这将导致数据FRME数量不相等。有没有办法修改它？某些产品可能没有一些标签，如果您使用g，您可以获得数量不相等的项目分别设置每个值-即仅图片，仅标题。这就是为什么我显示代码的第二部分-

分组


[html]相关文章推荐



                                                        
如何将HTML表格拉伸到100%的浏览器窗口高度？
htmlcsslayout 
Html 包含所有子元素的悬停CSS
htmlcss 
Html 默认值a：悬停覆盖ie6类的
htmlcss 
Html 为什么CSS的宽度和高度属性不调整填充？
htmlcss 
Html 检索根目录之外的文件时出现问题
html 
Html CSS图像调整自身的百分比？
htmlcssimage 
Html 单引号标点符号转换为问号
html 
Html Windows.UI.Popups.MessageDialog（）在Metro应用程序中不工作？
htmlvisual-studio-2012 
Html XSLT-将一个模板应用于子对象，将另一个模板应用于子对象的子对象
htmlxmltemplatesxslt 
Html 带覆盖的图像上的超链接
htmlcssimage 
HTML导入：未注册CustomTag
htmldart 
Html 如何将六个div放在两行
htmlcss 
Html 整页宽度div
htmlcss 
有没有一种方法可以在Python中返回html模板上的函数输出？
htmlflask 
Html 如何在新品中实现li显示
htmlcss 
Html 内部引号<；预处理>；显示为â的元素€œ；或相似的性格
html 
Html 词组的第一个字母
htmlcss 
RMarkDown：使用includeHTML（）包含许多HTML页面
html 
Html 反应超出模态的X箭头。试过zindex但没用
htmlcsssvg 
Html 如何从Blogger代码的动态视图侧边栏中删除菜单栏？
htmlcssmenu 
                                       





随机文章推荐



                                                        
Clearcase 将签出（递归）添加到上下文菜单不起作用
clearcase 
Clearcase 如何按标签搜索文件
clearcase 
基于元素属性的ClearCase配置规范选择
clearcase 
从基本clearcase迁移到UCM的所有步骤是什么？
clearcase 
Clearcase 什么是'；放前备份'；在RTC源代码控制中？
clearcase 
当我区分集成流基线时，为什么Clearcase diffbl包含来自我的开发流的活动？
clearcase 
Clearcase 重新设置另一个开发人员的流的基础
clearcase 
Clearcase 从脚本在单个cleartool会话中运行多个命令？
问题:
clearcase 
Clearcase 清除案例标签错误
clearcase 
ClearCase签入消息
clearcase 
将clearcase Eclipse文件转换为签出
clearcase 
如何在Base clearcase VOB中复制标签的内容？
clearcase 
Clearcase 无法传递流：clearacase子系统检测到错误
clearcase 
ClearCase：内务管理
clearcase 
Clearcase 创建UCM组件基线-仅适用于修改的组件
clearcase 
Clearcase 当建立一个相对的符号链接时，"；“不同版本的对象库”；误差平均值？
clearcase