Python 羊瘙痒性硒分页_Python_Selenium_Pagination_Web Scraping

Python 羊瘙痒性硒分页

python selenium pagination web-scraping

Python 羊瘙痒性硒分页,python,selenium,pagination,web-scraping,Python,Selenium,Pagination,Web Scraping,我在努力刮。我使用了两种方法，第一种是使用爬行蜘蛛和规则。对结果不太满意，我现在尝试使用Selenium浏览每个链接。唯一的问题是分页问题。我希望selenium浏览器打开网页，浏览starturl中的每个链接，然后单击底部的下一页。到目前为止，我编写的代码仅用于提取所需内容，如下所示： self.driver.get(response.url) div_val = self.driver.find_elements_by_xpath('//div[@class="tab_con

我在努力刮。我使用了两种方法，第一种是使用爬行蜘蛛和规则。对结果不太满意，我现在尝试使用Selenium浏览每个链接。唯一的问题是分页问题。我希望selenium浏览器打开网页，浏览starturl中的每个链接，然后单击底部的下一页。到目前为止，我编写的代码仅用于提取所需内容，如下所示：

self.driver.get(response.url) div_val = self.driver.find_elements_by_xpath('//div[@class="tab_contents"]') for link in div_val: l = link.find_element_by_tag_name('a').get_attribute('href') if re.match(r'http:\/\/www\.tripadvisor\.com\/Hotels\-g[\d]*\-Dominican\_Republic\-Hotels\.html',l): link.click() time.sleep(5) try: hotel_links = self.driver.find_elements_by_xpath('//div[@class="listing_title"]') for hotel_link in hotel_links: lnk = hotel_link.find_element_by_class_name('property_title').get_attribute('href') except NoSuchElementException: print 'elemenotfound

我现在只能用selenium进行分页
我认为
爬行蜘蛛
和
硒
的组合将对您有效-

for click in range(0,15):#clicking on next button for pagination button = self.driver.xpath("/html/body/div[3]/div[7]/div[2]/div[7]/div[2]/div[1]/div[3]/div[2]/div/div/div[41]/div[2]/div/a") button.click() time.sleep(10) for i in range(0,10):#range depends upon number of listings you can change it# for entering into the individual url using response item['url'] = response.xpath('a[contains(@class,"property_title ")]/@href').extract()[i] if item['url']: if 'http://' not in item['url']: item['url'] = urljoin(response.url, item['url']) yield scrapy.Request(item['url'], meta={'item': item}, callback=self.anchor_page) def anchor_page(self, response): old_item = response.request.meta['item'] data you want to scrape yield old_item

你可以自动点击下一步按钮，并在请求之间暂停。我认为这对你来说会很好。如果我是正确的，您是否希望输入列表中的每个链接并提取数据，然后单击“下一步”按钮完成所有页面？