python从xml读取数据
我将scrapy与python一起使用 我试图从xml文件中获取xpath,如下所示:python从xml读取数据,python,xml,python-2.7,xpath,scrapy,Python,Xml,Python 2.7,Xpath,Scrapy,我将scrapy与python一起使用 我试图从xml文件中获取xpath,如下所示: def getMasterContainers(self): containers=[] containersFromXML = self.doc.findall('MasterPage/Containers/xpath') for oneXpath in containersFromXML: containers.append(oneXpath.text) r
def getMasterContainers(self):
containers=[]
containersFromXML = self.doc.findall('MasterPage/Containers/xpath')
for oneXpath in containersFromXML:
containers.append(oneXpath.text)
return containers
xml文件是:
<Containers>
<xpath>''.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]''</xpath>
</Containers>
我的问题
当我尝试sel.xpath(self.containers[0])
时,我没有得到任何结果,但是当我像这样在代码中编写xpath时
sel.xpath(“手工编写的xpath”)
我得到了当前数据
请帮助。更新:您确定这个xpath有问题吗?您是否已确认它不会在xpath之前或之后失败?我不确定如何使用scrapy运行scrape,所以我只是手动运行XML解析,并在真实文档和测试文档上运行以下内容 first.xml只包含xpath及其父结构:
<websiteInformation>
<MasterPage>
<Containers>
<xpath>.//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']</xpath>
</Containers>
</MasterPage>
</websiteInformation>
产出:
.//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Xpath: .//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Container: 2
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found A</div>'>
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found B</div>'>
看起来不错
test.html是:
<html>
<body>
<div id="results-list">
<div class="item paid-featured-item">
<div class="listing-item">Found A</div>
</div>
<div class="item paid-featured-item">
<div class="listing-item">Found B</div>
</div>
</div>
</body>
</html>
产出:
.//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Xpath: .//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Container: 2
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found A</div>'>
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found B</div>'>
您的xpath字符串似乎有多余的单引号(
“
),而这些单引号本不应该出现。在XML中,它看起来像:
<xpath>''.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]''</xpath>
您不需要周围的”
s。应该是这样的:
.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]
如果可以编辑包含XPath的XML文件,请删除前导的”&apos代码>和尾随的&apos;'代码>来自每个
。因此:
<Containers>
<xpath>''.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]''</xpath>
</Containers>
应成为:
<Containers>
<xpath>.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]</xpath>
</Containers>
containers.append(oneXpath.text.strip("'"))
我真的尝试了数百万次,删除了两个quoats,但仍然是相同的错误,我可以给你发送整个代码吗?这只是一个小问题script@MarcoDinatsoli编辑你的问题并把全部代码放在那里,我会看一下。我已经用xml文件发布了全部代码。请过一段时间我必须把它取下来。谢谢你的帮助我张贴了代码,如果你没有自由请检查它,告诉我以便删除它。感谢你的理解。thanks@MarcoDinatsoli你可以随意删除你需要的,我把它抄下来了
<Containers>
<xpath>''.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]''</xpath>
</Containers>
<Containers>
<xpath>.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]</xpath>
</Containers>
containers.append(oneXpath.text)
containers.append(oneXpath.text.strip("'"))