Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 刮痧>;索引器:列表索引超出范围_Python_Xpath_Scrapy_Tripadvisor - Fatal编程技术网

Python 刮痧>;索引器:列表索引超出范围

Python 刮痧>;索引器:列表索引超出范围,python,xpath,scrapy,tripadvisor,Python,Xpath,Scrapy,Tripadvisor,我试图删除TripAdvisor的一些数据。 我有兴趣了解餐厅的“价格范围/烹饪和餐饮” 因此,我使用以下xpath在同一个类中提取这3行中的每一行: response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()').extract()[1] 我直接在scrapy shell中进行测试,效果良好: scrapy s

我试图删除TripAdvisor的一些数据。 我有兴趣了解餐厅的“价格范围/烹饪和餐饮”

因此,我使用以下xpath在同一个类中提取这3行中的每一行:

response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()').extract()[1]
我直接在scrapy shell中进行测试,效果良好:

scrapy shell https://www.tripadvisor.com/Restaurant_Review-g187514-d15364769-Reviews-La_Gaditana_Castellana-Madrid.html
但当我将其集成到脚本中时,出现以下错误:

    Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/root/Scrapy_TripAdvisor_Restaurant-master/tripadvisor_las_vegas/tripadvisor_las_vegas/spiders/res_las_vegas.py", line 64, in parse_listing
    (response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')[1])
  File "/usr/lib/python3.6/site-packages/parsel/selector.py", line 61, in __getitem__
    o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range
在tripAdvisor餐厅中,有两种不同类型的页面,两种不同的格式。 第一张是班级概览卡,第二张是班级卡

所以我想检查第一张(概览卡)是否存在,如果没有,执行第二张(卡片),如果没有,则输入“None”值

:D但看起来像Python同时执行这两个。。。。由于第二个脚本在页面中不存在,脚本停止

可能是缩进错误吗

谢谢你的帮助 第二个选择器(
行卡
)失败,因为页面上不存在该元素。然后,当您尝试访问结果中的
[1]
时,会抛出一个错误,因为结果数组为空

假设您确实需要项目
1
,请尝试以下操作

row_cuisine_overviewcard = \
(response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()')[1])
# Here we get all the values, even if it is empty.
row_cuisine_card = \
(response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()').getall()) 


if (row_cuisine_overviewcard == "CUISINES"):
    cuisine = \
    response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
# Here we check first if that result has more than 1 item, and then we check the value.
elif (len(row_cuisine_card) > 1 and row_cuisine_card[1] == "CUISINES"):
    cuisine = \
    response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]
else:
    cuisine = None

无论何时尝试从选择器获取特定索引,都应该应用相同类型的安全检查。换句话说,在访问之前,请确保您有一个值。

您的问题已经在这一行的检查中_

row_cuisine_card = \
    (response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')[1])
您正在尝试从网站中提取可能不存在的值。换句话说,如果

response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')
不返回或仅返回一个元素,则无法访问返回列表中的第二个元素(要使用附加的
[1]
访问该元素)

我建议先将从网站提取的值存储到局部变量中,然后检查是否找到了所需的值。我的猜测是,它打开的页面没有您想要的信息

这可能大致类似于以下代码:

# extract restaurant cuisine
cuisine = None
cuisine_overviewcard_sections = response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()'
if len(cuisine_overviewcard_sections) >= 2:
    row_cuisine_overviewcard = cuisine_overviewcard_sections[1]
    cuisine_card_sections = response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()'
    if len(cuisine_card_sections) >= 2:
        row_cuisine_card = cuisine_card_sections[1]
        if (row_cuisine_overviewcard == "CUISINES"):
            cuisine = \
            response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
        elif (row_cuisine_card == "CUISINES"):
            cuisine = \
            response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]
由于您只需要部分信息,如果第一次XPath检查已经返回了正确答案,那么代码可以稍微美化一下:

# extract restaurant cuisine
cuisine = None
cuisine_overviewcard_sections = response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()'
if len(cuisine_overviewcard_sections) >= 2 and cuisine_overviewcard_sections[1] == "CUISINES":
    cuisine = \
            response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
else:
    cuisine_card_sections = response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()'
    if len(cuisine_card_sections) >= 2 and cuisine_card_sections[1] == "CUISINES":
        cuisine = \
            response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]

这样,您只能在实际需要时执行XPath搜索(可能非常昂贵)。

确定不需要项目编号0而不是1吗?谢谢!!最后我找到了另一个解决方案,但你的更好xDI很高兴听到我的解决方案可以帮助你:-)也谢谢你的投票。谢谢你的回答;)
# extract restaurant cuisine
cuisine = None
cuisine_overviewcard_sections = response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()'
if len(cuisine_overviewcard_sections) >= 2 and cuisine_overviewcard_sections[1] == "CUISINES":
    cuisine = \
            response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
else:
    cuisine_card_sections = response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()'
    if len(cuisine_card_sections) >= 2 and cuisine_card_sections[1] == "CUISINES":
        cuisine = \
            response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]