将嵌套值传递给scrapy中的类方法_Scrapy

将嵌套值传递给scrapy中的类方法

scrapy

将嵌套值传递给scrapy中的类方法,scrapy,Scrapy,我不熟悉网络抓取，请原谅我的术语可能含糊不清：| 我正试图为其编写spider的HTML页面片段： <h3>2019 General Meetings</h3> <p><strong>Group 20:</strong> <br />Wednesday, June 5, 9 a.m. <br /> Bank & Trust, 10000 E. Western Ave.</p> <

我不熟悉网络抓取，请原谅我的术语可能含糊不清：|

我正试图为其编写spider的HTML页面片段：

<h3>2019 General Meetings</h3>
<p><strong>Group 20:</strong> <br />Wednesday, June 5, 9 a.m. <br /> Bank &amp; Trust, 10000 E. Western Ave.</p>
<p>Wednesday, July 11, 9 a.m. <br />Bank &amp; Trust, 10000 E. Western Ave.</p>
<p><strong>Group 20:</strong> <br />Monday, July 8, 9 a.m.<br />Hubbard, 1740 W. 199th St.</p>
<p>&nbsp;</p></div>

我想我已经成功了。（但需要更多的测试来确定。）

我需要将每个传递给类中相应的解析器，每个解析器都应该返回一条关于会议的必需信息，例如

\u parser\u date

将返回日期，

\u parser\u address

将返回地址，然后继续执行

我没有找到正确的scrapy/xpath语法。接下来，我不能让它很好地工作

我特别感兴趣的是，每个解析器“拾取”要解析的中的模式，如果是日期模式，则格式化并返回。如果这是一个位置模式。。等等

我试图避免使用

re.（

），除非你建议在这里这样做是正确的。任何见解都是非常受欢迎的，谢谢。

这应该可以：

for p_node in response.xpath('//h3[contains(., 'General Meetings')]/following-sibling::p[position() < last()]'):
    address = p_node.xpath('./text()[last()]).get()
    date = p_node.xpath('./text()[last() - 1]).get()

xpath（'//h3[contains（，'General Meetings'）]/以下同级：：p[position（）

我使用了

position（）

跳过最后一个空
，并且从末尾解析数据。
不是那些节点的顶层（即父节点），它是一个sibling。理想情况下，如果可能，您应该选择环绕h3和p的节点。对于您发布的代码段，您可以使用//h3[contains（text（），'General Meetings'）]/following:：p[1]，并相应地更改索引。然而，在实际页面中，我认为这不会很好地工作。我建议你发布一个更大的HTML片段。我理解。我想我还有更多关于xpath/xml/html的阅读：）
for p_node in response.xpath('//h3[contains(., 'General Meetings')]/following-sibling::p[position() < last()]'):
    address = p_node.xpath('./text()[last()]).get()
    date = p_node.xpath('./text()[last() - 1]).get()