Python Scrapy:如何在表中使用XPATH迭代创建dict输出<;tr>&书信电报;td>;
我有这个html结构:Python Scrapy:如何在表中使用XPATH迭代创建dict输出<;tr>&书信电报;td>;,python,xpath,scrapy,Python,Xpath,Scrapy,我有这个html结构: <table> <tbody> <tr>....</tr> <tr>....</tr> <tr>....</tr> <td align= "right" bgcolor="#ffffff">...</td> <td bgcolor="efefef&
<table>
<tbody>
<tr>....</tr>
<tr>....</tr>
<tr>....</tr>
<td align= "right" bgcolor="#ffffff">...</td>
<td bgcolor="efefef">...</td>
<td align= "right" bgcolor="#ffffff">...</td>
<td bgcolor="efefef">...</td>
<tr>....</tr>
我试过了,但没用。我不熟悉XPATH和scrapy,我不知道如何做这种事情。我提取了键和值在单独的数组中,但这种方法不能解决我的问题,我需要用它们各自的键值对来提取它们。下面是一个示例。你可能需要调整一下。假设这是您的数据:
<table>
<tr>
<td align= "right" bgcolor="#ffffff">a</td>
<td bgcolor="efefef">1</td>
<td align= "right" bgcolor="#ffffff">b</td>
<td bgcolor="efefef">2</td>
</tr>
<tr>
<td align= "right" bgcolor="#ffffff">c</td>
<td bgcolor="efefef">3</td>
<td align= "right" bgcolor="#ffffff">d</td>
<td bgcolor="efefef">4</td>
</tr>
<tr>
<td align= "right" bgcolor="#ffffff">e</td>
<td bgcolor="efefef">5</td>
<td align= "right" bgcolor="#ffffff">f</td>
<td bgcolor="efefef">6</td>
</tr>
<tr>
<td align= "right" bgcolor="#ffffff">g</td>
<td bgcolor="efefef">7</td>
<td align= "right" bgcolor="#ffffff">h</td>
<td bgcolor="efefef">8</td>
</tr>
</table>
输出:
{'a': '1', 'b': '2', 'c': '3', 'd': '4', 'e': '5', 'f': '6', 'g': '7', 'h': '8'}
表[3]
和您的示例输入不匹配我如何将td[position()=1或position()=3]与td[@align='right']组合?有了这个,我将准确地提取我想要使用的元素items.xpath(“./td[@align='right']”)。getall()
和items.xpath(“./td[@bgcolor='efefef']”)。getall()
。这就足够了。
# Variables declaration, XPath, and loop+fill. We select the keys and the values in each tr at the same time.
key=[]
value=[]
for items in response.xpath("//table/tr"):
key.append(items.xpath("./td[position()=1 or position()=3]").getall())
value.append(items.xpath("./td[position()=2 or position()=4]").getall())
# Flatten the lists and extract the text :
keys = [item.text for sublist in key for item in sublist]
values = [item.text for sublist in value for item in sublist]
# Create the dictionnary :
dictionary = dict(zip(keys, values))
print(dictionary)
{'a': '1', 'b': '2', 'c': '3', 'd': '4', 'e': '5', 'f': '6', 'g': '7', 'h': '8'}