Python 使用Scrapy爬网本地XML文件-开始URL本地文件地址_Python_Xml_Xpath_Scrapy_Scrapy Spider

Python 使用Scrapy爬网本地XML文件-开始URL本地文件地址

python xml xpath scrapy

Python 使用Scrapy爬网本地XML文件-开始URL本地文件地址,python,xml,xpath,scrapy,scrapy-spider,Python,Xml,Xpath,Scrapy,Scrapy Spider,我想用scrapy抓取我在下载文件夹中找到的本地xml文件，使用xpath提取相关信息使用刮痧介绍作为我只是想确认一下我在那个地方有文件 sayth@sayth-HP-EliteBook-2560p : ~/Downloads [0] % ls -a . Building a Responsive Website with Bootstrap [Video].zip ..

我想用scrapy抓取我在下载文件夹中找到的本地xml文件，使用xpath提取相关信息

使用刮痧介绍作为

我只是想确认一下我在那个地方有文件

sayth@sayth-HP-EliteBook-2560p : ~/Downloads
[0] % ls -a
.                                                              Building a Responsive Website with Bootstrap [Video].zip
..                                                             codemirror.zip
1.1 Situation Of Long Term Gain.xls                            Complete-Python-Bootcamp-master.zip
2008 Racedata.xls                                              Cox Plate 2005.xls
20160123RAND0.xml

根本不指定

允许的\u域

，在协议后使用3个斜杠：

start_urls = ["file:///home/sayth/Downloads/20160123RAND0.xml"]

必须使用文件的绝对路径来使用

文件：//

协议指定本地文件。
我个人建议为此使用

pathlib

，而不是使用字符串指定绝对值

下面是一个用法示例

导入路径库
起始URL=[
Path（os.Path.abspath（'20160123RAND0.xml'））.as_uri（）
]

as_uri（）

方法将路径转换为

文件：//

uri

sayth@sayth-HP-EliteBook-2560p : ~/Downloads
[0] % ls -a
.                                                              Building a Responsive Website with Bootstrap [Video].zip
..                                                             codemirror.zip
1.1 Situation Of Long Term Gain.xls                            Complete-Python-Bootcamp-master.zip
2008 Racedata.xls                                              Cox Plate 2005.xls
20160123RAND0.xml

start_urls = ["file:///home/sayth/Downloads/20160123RAND0.xml"]