Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python xpath循环段落并抓取_Python_Loops_Xpath - Fatal编程技术网

python xpath循环段落并抓取

python xpath循环段落并抓取,python,loops,xpath,Python,Loops,Xpath,我有一系列的段落要用xpath解析。html的格式如下所示: <div id="content_third"> <h3>Title1</h3> <p> <strong>District</strong> John Q Public <br> Susie B Private <p> <p> <strong>District</strong>

我有一系列的段落要用xpath解析。html的格式如下所示:

<div id="content_third">
 <h3>Title1</h3>
 <p>
  <strong>District</strong>
  John Q Public <br>
  Susie B Private 
 <p>
 <p>
  <strong>District</strong>
  Anna C Public <br>
  Bob J Private 
 <p>
 <h3>Title1</h3>
 <p>
  <strong>District</strong>
  John Q Public <br>
  Susie B Private 
 <p>
 <p>
  <strong>District</strong>
  Anna C Public <br>
  Bob J Private 
 <p>
</div>
titles = tree.xpath('//*[@id="content_third"]/h3')
for num in range(len(titles):
然后是一个内部循环:

district_races = tree.xpath('//*[@id="content_third"]/p[count(preceding-sibling::h3)={0}]'.format(num))
for index in range(len(district_races)):
每个环路,我只想选择该区域内的强。我尝试过这个方法,除了一个填充了所有区域的数组外,它会吐出空数组:

zone = tree.xpath('//*[@id="content_third"]/p[count(preceding-sibling::h3)={0}/strong[{1}]/text()'.format(num, index))
我喜欢那些未格式化的州选举网页。

我想每个选区都是某个实际名称的占位符,所以要获得每个选区比你想做的要简单得多,只需从每个选区中提取文本:

我假设每个地区都是某个实际名称的占位符,因此要获得每个地区比您要做的要简单得多,只需从每个p中的每个strong中提取文本:


你用的是什么模块?请包括任何进口行。此外,只有单词District包含有标记。请显示所需的输出。您使用的是什么模块?请包括任何进口行。此外,只有单词District包含有标记。请显示所需的输出。
h = """<div id="content_third">
 <h3>Title1</h3>
 <p>
  <strong>District</strong>
  John Q Public <br>
  Susie B Private
 <p>
 <p>
  <strong>District</strong>
  Anna C Public <br>
  Bob J Private
 <p>
 <h3>Title1</h3>
 <p>
  <strong>District</strong>
  John Q Public <br>
  Susie B Private
 <p>
 <p>
  <strong>District</strong>
  Anna C Public <br>
  Bob J Private
 <p>
</div>"""

from lxml import html

tree = html.fromstring(h)

print(tree.xpath('//*[@id="content_third"]/p/strong/text()'))