python从xml读取数据

python从xml读取数据,python,xml,python-2.7,xpath,scrapy,Python,Xml,Python 2.7,Xpath,Scrapy,我将scrapy与python一起使用 我试图从xml文件中获取xpath,如下所示: def getMasterContainers(self): containers=[] containersFromXML = self.doc.findall('MasterPage/Containers/xpath') for oneXpath in containersFromXML: containers.append(oneXpath.text) r

我将scrapy与python一起使用

我试图从xml文件中获取xpath,如下所示:

def getMasterContainers(self):
    containers=[]
    containersFromXML = self.doc.findall('MasterPage/Containers/xpath')
    for oneXpath in containersFromXML:
        containers.append(oneXpath.text)
    return containers
xml文件是:

<Containers>
  <xpath>'&apos;.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]&apos;'</xpath>
</Containers>
我的问题 当我尝试
sel.xpath(self.containers[0])
时,我没有得到任何结果,但是当我像这样在代码中编写xpath时
sel.xpath(“手工编写的xpath”)
我得到了当前数据


请帮助。

更新:您确定这个xpath有问题吗?您是否已确认它不会在xpath之前或之后失败?我不确定如何使用scrapy运行scrape,所以我只是手动运行XML解析,并在真实文档和测试文档上运行以下内容

first.xml只包含xpath及其父结构:

<websiteInformation>
  <MasterPage>
    <Containers>
      <xpath>.//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']</xpath>
    </Containers>
  </MasterPage>
</websiteInformation>
产出:

.//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Xpath: .//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Container: 2
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found A</div>'>
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found B</div>'>
看起来不错

test.html是:

<html>
  <body>
    <div id="results-list">
      <div class="item paid-featured-item">
        <div class="listing-item">Found A</div>
      </div>
      <div class="item paid-featured-item">
        <div class="listing-item">Found B</div>
      </div>
    </div>
  </body>
</html>
产出:

.//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Xpath: .//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']
Container: 2
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found A</div>'>
<Selector xpath=".//div[@id='results-list']/div[@class='item paid-featured-item']/div[@class='listing-item']" data=u'<div class="listing-item">Found B</div>'>

您的xpath字符串似乎有多余的单引号(
),而这些单引号本不应该出现。在XML中,它看起来像:

<xpath>'&apos;.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]&apos;'</xpath>
您不需要周围的
s。应该是这样的:

.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]
如果可以编辑包含XPath的XML文件,请删除前导的
”&apos和尾随的
&apos;'来自每个
。因此:

<Containers>
  <xpath>'&apos;.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]&apos;'</xpath>
</Containers>
应成为:

<Containers>
  <xpath>.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]</xpath>
</Containers>
containers.append(oneXpath.text.strip("'"))

我真的尝试了数百万次,删除了两个quoats,但仍然是相同的错误,我可以给你发送整个代码吗?这只是一个小问题script@MarcoDinatsoli编辑你的问题并把全部代码放在那里,我会看一下。我已经用xml文件发布了全部代码。请过一段时间我必须把它取下来。谢谢你的帮助我张贴了代码,如果你没有自由请检查它,告诉我以便删除它。感谢你的理解。thanks@MarcoDinatsoli你可以随意删除你需要的,我把它抄下来了
<Containers>
  <xpath>'&apos;.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]&apos;'</xpath>
</Containers>
<Containers>
  <xpath>.//div[@id="results-list"]/div[@class="item paid-featured-item"]/div[@class="listing-item"]</xpath>
</Containers>
containers.append(oneXpath.text)
containers.append(oneXpath.text.strip("'"))