Python 如何将Selenium html页面传递给htmlXpathSelector_Python_Selenium_Scrapy

Python 如何将Selenium html页面传递给htmlXpathSelector

python selenium scrapy

Python 如何将Selenium html页面传递给htmlXpathSelector,python,selenium,scrapy,Python,Selenium,Scrapy,我需要抓取一个使用javascript的页面。这就是我使用硒的原因。问题是selenium无法获取所需的数据我想使用htmlXmlSelector尝试获取数据如何将生成的html selenium传递给HTMLXLSelector？尝试手动创建响应： from scrapy.http import TextResponse from scrapy.selector import HtmlXPathSelector body = '''<html></html>'

我需要抓取一个使用javascript的页面。这就是我使用硒的原因。问题是selenium无法获取所需的数据

我想使用htmlXmlSelector尝试获取数据

如何将生成的html selenium传递给HTMLXLSelector？

尝试手动创建

响应

：

from scrapy.http import TextResponse
from scrapy.selector import HtmlXPathSelector

body = '''<html></html>'''

response = TextResponse(url = '', body = body, encoding = 'utf-8')

hxs = HtmlXPathSelector(response)
hxs.select("/html")

来自scrapy.http导入文本响应
从scrapy.selector导入HtmlXPathSelector
正文
response=TextResponse（url=''，body=body，encoding='utf-8'）
hxs=HtmlXPathSelector（响应）
hxs.select（“/html”）

这是我的解决方案：只需从selenium页面创建htmlXpathSelector\u来源：

hxs = HtmlXPathSelector(text=sel.page_source)

硒的手动响应：

from scrapy.spider import BaseSpider
from scrapy.http import TextResponse
from scrapy.selector import HtmlXPathSelector
import time
from selenium import selenium

class DemoSpider(BaseSpider):
    name="Demo"
    allowed_domains = ['http://www.example.com']
    start_urls = ["http://www.example.com/demo"]

    def __init__(self):
        BaseSpider.__init__(self)
        self.selenium = selenium("127.0.0.1", 4444, "*chrome", self.start_urls[0])
        self.selenium.start()

    def __del__(self):
       self.selenium.stop()

    def parse (self, response):
        sel = self.selenium
        sel.open(response.url)
        time.sleep(2.0) # wait for javascript execution

        #build the response object from Selenium
        body = sel.get_html_source()
        sel_response = TextResponse(url=response.url, body=body, encoding = 'utf-8')
        hxs = HtmlXPathSelector(sel_response)
        hxs.select("//table").extract()

硒是如何发挥作用的？我做了selenium.get（url）。如何继续？我没有使用硒，但我想你可以从中得到。让页面主体创建一个

响应

，然后可以对其使用

htmlxpath选择器

。如何在这一行前面使用sel

body=sel.get_html_source（）

，我需要进行XPATH查询，然后基于返回的元素，我需要e。逐个单击（）它们，然后下载get_html_source（），知道怎么做吗？sel似乎没有对内容进行xpath查询的方法？