Python 如何通过lxml解析从html文件打印出所有文本信息?

Python 如何通过lxml解析从html文件打印出所有文本信息?,python,parsing,iteration,output,lxml,Python,Parsing,Iteration,Output,Lxml,我有一个这样的html文件: <html> <head></head> <body> <p> <dfn>Definition</dfn>sometext / '' (<i>othertext</i>)someothertext / '' (<i>...</i>) (<i>..

我有一个这样的html文件:

<html>
  <head></head>
    <body>
      <p>
       <dfn>Definition</dfn>sometext / ''
       (<i>othertext</i>)someothertext / ''
       (<i>...</i>)
       (<i>...</i>)
      </p>
       <p>
         <dfn>Definition2</dfn>sometext / ''
         (<i>othertext</i>)someothertext / ''
         <i>blabla</i>
         <i>bubu</i>
       </p>
     </body>
</html>
这给我的输出是正确的一半。例如,它只是跳过此表单的条目:

<p><dfn>Cityname</dfn>, text 2349 </p> 
或者我从I标签和它们的部分标签中获取文本。。。 我想问题是关于迭代的,但我真的找不到错误

有什么有效的方法来实现我的目标吗


顺便说一句,我也用tree.xpath'//p/text'尝试了一些东西,但它太笼统了,在我的例子中,我需要提取dfn的兄弟文本与dfn本身相关:如果dfn是好的,我有更多的代码来定义dfn是否是好的,然后打印出dfn以及p标记中随它而来的所有文本。

我会尝试以下方法:

for p in tree.xpath("//p"):  # This gets all the p elements
    dfn = p.xpath('./dfn')[0]  # may want to check this exists first
    after_dfn = p.xpath("./dfn/following-sibling::node()")
    for x in after_dfn:
        pass  # do whatever you need to do with the stuff after dfn

谢谢你的提示,我有这个给我我需要的东西:

唯一的问题是——它会导致一个无限循环,我如何才能摆脱它

for p in tree.xpath("//p"):  # This gets all the p elements
    dfn = p.xpath('./dfn')[0]  # may want to check this exists first
    after_dfn = p.xpath("./dfn/following-sibling::node()")
    for x in after_dfn:
        pass  # do whatever you need to do with the stuff after dfn
for p in tree.xpath("//p"):
  dfn = p.xpath('./dfn/text()')
  after_dfn = p.xpath("./dfn/following::text()")
  if dfn!=None:
    print dfn
  if after_dfn !=None:    
    for x in after_dfn:
        print x