Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用div刮取特定表,该表用scrapy保存文本_Python_Html_Web Scraping_Scrapy - Fatal编程技术网

Python 使用div刮取特定表,该表用scrapy保存文本

Python 使用div刮取特定表,该表用scrapy保存文本,python,html,web-scraping,scrapy,Python,Html,Web Scraping,Scrapy,我正在使用scrapy从表中的网站中刮取内容 代码示例: <tr> <td><div>2018/2058</div></td> <td class="address"><div>Land North of 37 and 39 Hare Lane Claygate Esher Surrey KT10 9BT</div>

我正在使用scrapy从表中的网站中刮取内容

代码示例:

            <tr>
                <td><div>2018/2058</div></td>
                <td class="address"><div>Land North of 37 and 39 Hare Lane Claygate Esher Surrey KT10 9BT</div></td>
                <td class="proposal"><div>Confirmation of Compliance with Conditions: 5 (Tree Protection and Pre-Commencement Inspection) and 6 (Tree Protection) of planning permission 2017/0451.</div></td>
                <td><div style="min-width:90px">Claygate Ward</div></td>
            </tr>

这是网站:

提前谢谢

first_td_text = response.xpath('//tr[1]/td[1]/div/text()').extract_first()
更新

'address': response.xpath('//td[@class="address"]/div/text()').extract_first(),
更新

'address': response.xpath('//td[@class="address"]/div/text()').extract_first(),

使用gangabass中的xpath:

import scrapy

class txt_filter:
     txt= '<tr>\
                     <td><div>2018/2058</div></td>\
                     <td class="address"><div>Land North of 37 and 39 Hare Lane Claygate Esher Surrey KT10 9BT</div></td>\
                     <td class="proposal"><div>Confirmation of Compliance with Conditions: 6 (Tree Protection and Pre-Commencement Inspection) and 6 (Tree Protection) of planning permission 2017/0451.</div></td>\
                     <td><div style="min-width:90px">Claygate Ward</div></td>\
                </tr>'
     resp = scrapy.http.response.text.TextResponse(body=txt,url='abc',encoding='utf-8')
     print(resp.xpath('//tr[1]/td/div/text()').extract())
import scrapy
类txt_过滤器:
txt='1\
2018/2058\
Hare Lane Claygate Esher Surrey KT10 9BT 37号和39号以北的土地\
符合条件确认:2017/0451规划许可6(树木保护和开工前检查)和6(树木保护)\
克莱盖特病房\
'
resp=scrapy.http.response.text.TextResponse(body=txt,url='abc',encoding='utf-8')
打印(分别是xpath('//tr[1]/td/div/text()').extract())

仅从td中删除[1]以获取所有行。

使用gangabass中的xpath:

import scrapy

class txt_filter:
     txt= '<tr>\
                     <td><div>2018/2058</div></td>\
                     <td class="address"><div>Land North of 37 and 39 Hare Lane Claygate Esher Surrey KT10 9BT</div></td>\
                     <td class="proposal"><div>Confirmation of Compliance with Conditions: 6 (Tree Protection and Pre-Commencement Inspection) and 6 (Tree Protection) of planning permission 2017/0451.</div></td>\
                     <td><div style="min-width:90px">Claygate Ward</div></td>\
                </tr>'
     resp = scrapy.http.response.text.TextResponse(body=txt,url='abc',encoding='utf-8')
     print(resp.xpath('//tr[1]/td/div/text()').extract())
import scrapy
类txt_过滤器:
txt='1\
2018/2058\
Hare Lane Claygate Esher Surrey KT10 9BT 37号和39号以北的土地\
符合条件确认:2017/0451规划许可6(树木保护和开工前检查)和6(树木保护)\
克莱盖特病房\
'
resp=scrapy.http.response.text.TextResponse(body=txt,url='abc',encoding='utf-8')
打印(分别是xpath('//tr[1]/td/div/text()').extract())

仅从td中删除[1]即可获得所有线路。

您可以使用pandas轻松完成此操作

table = pd.read_html(url)

现在,该表是一个包含完整表的数据框

您可以使用pandas轻松完成

table = pd.read_html(url)

现在,表格是一个包含完整表格的数据框

您是如何尝试获取文本的?我更新了它以显示我是如何尝试获取文本的?您是如何尝试获取文本的?我更新了它以显示我是如何尝试获取文本的