Python 如何通过lxml解析从html文件打印出所有文本信息？_Python_Parsing_Iteration_Output_Lxml

Python 如何通过lxml解析从html文件打印出所有文本信息？

python parsing

Python 如何通过lxml解析从html文件打印出所有文本信息？,python,parsing,iteration,output,lxml,Python,Parsing,Iteration,Output,Lxml,我有一个这样的html文件： <html> <head></head> <body> <dfn>Definition</dfn>sometext / '' (othertext)someothertext / '' (...) (..

我有一个这样的html文件：

<html>
  <head></head>
    <body>
      <p>
       <dfn>Definition</dfn>sometext / ''
       (<i>othertext</i>)someothertext / ''
       (<i>...</i>)
       (<i>...</i>)
      </p>
       <p>
         <dfn>Definition2</dfn>sometext / ''
         (<i>othertext</i>)someothertext / ''
         <i>blabla</i>
         <i>bubu</i>
       </p>
     </body>
</html>

这给我的输出是正确的一半。例如，它只是跳过此表单的条目：

<p><dfn>Cityname</dfn>, text 2349 </p>

或者我从I标签和它们的部分标签中获取文本。。。我想问题是关于迭代的，但我真的找不到错误

有什么有效的方法来实现我的目标吗

顺便说一句，我也用tree.xpath'//p/text'尝试了一些东西，但它太笼统了，在我的例子中，我需要提取dfn的兄弟文本与dfn本身相关：如果dfn是好的，我有更多的代码来定义dfn是否是好的，然后打印出dfn以及p标记中随它而来的所有文本。

我会尝试以下方法：

for p in tree.xpath("//p"):  # This gets all the p elements
    dfn = p.xpath('./dfn')[0]  # may want to check this exists first
    after_dfn = p.xpath("./dfn/following-sibling::node()")
    for x in after_dfn:
        pass  # do whatever you need to do with the stuff after dfn

谢谢你的提示，我有这个给我我需要的东西：

唯一的问题是——它会导致一个无限循环，我如何才能摆脱它

for p in tree.xpath("//p"):  # This gets all the p elements
    dfn = p.xpath('./dfn')[0]  # may want to check this exists first
    after_dfn = p.xpath("./dfn/following-sibling::node()")
    for x in after_dfn:
        pass  # do whatever you need to do with the stuff after dfn

for p in tree.xpath("//p"):
  dfn = p.xpath('./dfn/text()')
  after_dfn = p.xpath("./dfn/following::text()")
  if dfn!=None:
    print dfn
  if after_dfn !=None:    
    for x in after_dfn:
        print x