Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Can';t获取Scrapy以返回Div中的文本_Python_Xpath_Scrapy_Scrapy Spider - Fatal编程技术网

Python Can';t获取Scrapy以返回Div中的文本

Python Can';t获取Scrapy以返回Div中的文本,python,xpath,scrapy,scrapy-spider,Python,Xpath,Scrapy,Scrapy Spider,我很难从这个div返回文本。当它返回数据时,它返回的数据比我想象的要多得多 目标HTML: <div class="DivTimeSpan" title="Full Time">12:00 PM - 09:00 PM </div> 返回: “\r\n\r\n”、“\r\n\r\n\r\n var allowedUrls=[];\r\n allowedUrls.push(\“Login.net\”);\r\n allowedUrls.push(\“Login\”);\r\

我很难从这个div返回文本。当它返回数据时,它返回的数据比我想象的要多得多

目标HTML:

<div class="DivTimeSpan" title="Full Time">12:00 PM - 09:00 PM </div>
返回:

“\r\n\r\n”、“\r\n\r\n\r\n var allowedUrls=[];\r\n allowedUrls.push(\“Login.net\”);\r\n allowedUrls.push(\“Login\”);\r\n allowedUrls.push(\“AccountLogin\”);\r\n allowedUrls.push(\“AccountLogin\”);\r\n allowedUrls.push(\“CreateAccount\”);\r\n allowedUrls.push(\'CreateAccount.net\”);\r\n allowedUrls.push(\'UpdateAccount\”);\r\n allowedUrls.push(\'UpdateAccount.net\”);\r\n allowedUrls.push(\'CreateRederAccount.net\”);\r\n allowedUrls.push(\'CreateQestSaasAccount\”);\r\n allowedUrls.push\");\r\n
“上午11:00-09:00 PM”,“下午12:00-09:00 PM”,“下午12:00-09:00 PM”,“下午12:00-09:00 PM”,“下午12:00-09:00 PM”

整个文件可能有数千行长,包含我指定的div之外的文本

我理解了//text()返回元素及其子元素的文本。我所针对的html元素没有任何子元素,因此我假设它只返回div中的数据

接下来,我尝试使用“/text()”。这是唯一的改变

尝试2:

    for sel in response.xpath("//div[@class='DivTimeSpan']"):
        s_item['schedule'] = sel.select('/text()').extract()
    return s_item
返回:

[{“附表”:[]}]

预期结果:

[{“时间表”:[“上午11:00-晚上9:00”,“下午12:00-晚上9:00”,“下午12:00” -晚上9:00,“下午12:00-09:00”,“下午12:00-09:00”]]

我正在抓取的url位于公司登录的后面,因此我无法给出实际的url

Elisha的帖子指引了我正确的方向,谢谢!!!:) 答复:


第二次尝试更接近于提取值。但是,您需要从节点而不是从文档根中提取文本:

s_item['schedule'] = sel.select('/div/text()').extract()[0]
如果文档包含更多标记(不是div),您可以尝试:

s_item['schedule'] = sel.select('//div/text()').extract()[0]
s_item['schedule'] = sel.select('/div/text()').extract()[0]
s_item['schedule'] = sel.select('//div/text()').extract()[0]