Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup/LXML.html:如果子项看起来像x,则删除标记及其子项_Python_Beautifulsoup_Lxml - Fatal编程技术网

Python BeautifulSoup/LXML.html:如果子项看起来像x,则删除标记及其子项

Python BeautifulSoup/LXML.html:如果子项看起来像x,则删除标记及其子项,python,beautifulsoup,lxml,Python,Beautifulsoup,Lxml,我很难找到正确的解决方案。如果为=99,我想删除及其子项。因此,我需要一个带有过滤问题的字符串。我有以下html结构: <html> <body> <questionaire> <question> <questiontext> Do I have a question? </questiontext> <answer> 99 <

我很难找到正确的解决方案。如果
为=99,我想删除
及其子项。因此,我需要一个带有过滤问题的字符串。我有以下html结构:

<html>
 <body>        
  <questionaire>
   <question>
    <questiontext>
     Do I have a question?
    </questiontext>
    <answer>
     99
    </answer>
   </question>
   <question>
    <questiontext>
     Do I love HTML/XML parsing?
    </questiontext>
    <questalter>
     <choice>
      1 oh god yeah
     </choice>
     <choice>
      2 that makes me feel good
     </choice>
     <choice>
      3 oh hmm noo
     </choice>
     <choice>
      4 totally
     </choice>
     </questalter>
     <answer>
      4
    </answer>
   </question>
   <question>
  </questionaire>
 </body>
</html>      

我有问题吗?
99
我喜欢HTML/XML解析吗?
哦,天哪,是的
这让我感觉很好
3哦,嗯,不
总共4个
4.

到目前为止,我试图用xpath实现它…但是lxml.html没有iterparse…是吗?塔克斯

这将完全满足您的需要:

from xml.dom import minidom

doc = minidom.parseString(text)
for question in doc.getElementsByTagName('question'):
    for answer in question.getElementsByTagName('answer'):
        if answer.childNodes[0].nodeValue.strip() == '99':
            question.parentNode.removeChild(question)

print doc.toxml()
结果:

<html>
 <body>        
  <questionaire>

   <question>
    <questiontext>
     Do I love HTML/XML parsing?
    </questiontext>
    <questalter>
     <choice>
      1 oh god yeah
     </choice>
     <choice>
      2 that makes me feel good
     </choice>
     <choice>
      3 oh hmm noo
     </choice>
     <choice>
      4 totally
     </choice>
     </questalter>
     <answer>
      4
    </answer>
   </question>
  </questionaire>
 </body>
</html>

我喜欢HTML/XML解析吗?
哦,天哪,是的
这让我感觉很好
3哦,嗯,不
总共4个
4.

你好,马特,谢谢你的回答…这看起来很复杂…我想知道是否有一个使用BeautifulSoup或lxml的解决方案…?我更新了我的答案,以便它与你的html一起工作。请注意,结尾处有一个额外的
,这将导致分析错误。非常感谢…我发现minidom非常可怕,但它看起来也很不错!就我个人而言,我更喜欢lxml…我希望我能接受两个答案;-)
from lxml import etree
html = etree.fromstring(html_string)
questions = html.xpath('/html/body/questionaire/question')
for question in questions:
    for elements in question.getchildren():
        if element.tag == 'answer' and '99' in element.text:
            html.xpath('/html/body/questionaire')[0].remove(question)
print etree.tostring(html)