Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy:如何在表中使用XPATH迭代创建dict输出<;tr>&书信电报;td>;_Python_Xpath_Scrapy - Fatal编程技术网

Python Scrapy:如何在表中使用XPATH迭代创建dict输出<;tr>&书信电报;td>;

Python Scrapy:如何在表中使用XPATH迭代创建dict输出<;tr>&书信电报;td>;,python,xpath,scrapy,Python,Xpath,Scrapy,我有这个html结构: <table> <tbody> <tr>....</tr> <tr>....</tr> <tr>....</tr> <td align= "right" bgcolor="#ffffff">...</td> <td bgcolor="efefef&

我有这个html结构:

<table>
  <tbody>
    <tr>....</tr>
    <tr>....</tr>
    <tr>....</tr>
      <td align= "right" bgcolor="#ffffff">...</td>
      <td bgcolor="efefef">...</td>
      <td align= "right" bgcolor="#ffffff">...</td>
      <td bgcolor="efefef">...</td>
    <tr>....</tr>

我试过了,但没用。我不熟悉XPATH和scrapy,我不知道如何做这种事情。我提取了键和值​​在单独的数组中,但这种方法不能解决我的问题,我需要用它们各自的键值对来提取它们。

下面是一个示例。你可能需要调整一下。假设这是您的数据:

<table>
   <tr>
      <td align= "right" bgcolor="#ffffff">a</td>
      <td bgcolor="efefef">1</td>
      <td align= "right" bgcolor="#ffffff">b</td>
      <td bgcolor="efefef">2</td>
   </tr>
   <tr>
      <td align= "right" bgcolor="#ffffff">c</td>
      <td bgcolor="efefef">3</td>
      <td align= "right" bgcolor="#ffffff">d</td>
      <td bgcolor="efefef">4</td>
   </tr>
   <tr>
      <td align= "right" bgcolor="#ffffff">e</td>
      <td bgcolor="efefef">5</td>
      <td align= "right" bgcolor="#ffffff">f</td>
      <td bgcolor="efefef">6</td>
   </tr>
   <tr>
      <td align= "right" bgcolor="#ffffff">g</td>
      <td bgcolor="efefef">7</td>
      <td align= "right" bgcolor="#ffffff">h</td>
      <td bgcolor="efefef">8</td>
   </tr>
</table>
输出:

{'a': '1', 'b': '2', 'c': '3', 'd': '4', 'e': '5', 'f': '6', 'g': '7', 'h': '8'}

表[3]
和您的示例输入不匹配我如何将td[position()=1或position()=3]与td[@align='right']组合?有了这个,我将准确地提取我想要使用的元素
items.xpath(“./td[@align='right']”)。getall()
items.xpath(“./td[@bgcolor='efefef']”)。getall()
。这就足够了。
# Variables declaration, XPath, and loop+fill. We select the keys and the values in each tr at the same time.

key=[]
value=[]
for items in response.xpath("//table/tr"):
    key.append(items.xpath("./td[position()=1 or position()=3]").getall())
    value.append(items.xpath("./td[position()=2 or position()=4]").getall())

# Flatten the lists and extract the text :

keys = [item.text for sublist in key for item in sublist]
values = [item.text for sublist in value for item in sublist]

# Create the dictionnary :

dictionary = dict(zip(keys, values))
print(dictionary)
{'a': '1', 'b': '2', 'c': '3', 'd': '4', 'e': '5', 'f': '6', 'g': '7', 'h': '8'}