Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/87.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html 从XML文件中的字段中删除标记_Html_Python 3.x_Xml_Beautifulsoup_Html Parsing - Fatal编程技术网

Html 从XML文件中的字段中删除标记

Html 从XML文件中的字段中删除标记,html,python-3.x,xml,beautifulsoup,html-parsing,Html,Python 3.x,Xml,Beautifulsoup,Html Parsing,我有一个如下所示的XML文件: <?xml version="1.0" encoding="utf-8"?> <posts> <row Id="1" PostTypeId="1" AcceptedAnswerId="8" CreationDate="2012-12-11T20:37:08.823" Score="67" Vi

我有一个如下所示的XML文件:

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="1" PostTypeId="1" AcceptedAnswerId="8" CreationDate="2012-12-11T20:37:08.823" Score="67" ViewCount="17934" Body="&lt;p&gt;Assuming the world in the One Piece universe is round, then there is not really a beginning or an end of the Grand Line.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;The Straw Hats started out from the first half and are now sailing across the second half.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Wouldn't it have been quicker to set sail in the opposite direction from where they started?     &lt;/p&gt;&#xA;" OwnerUserId="21" LastEditorUserId="1398" LastEditDate="2015-04-17T19:06:38.957" LastActivityDate="2015-05-26T12:50:40.920" Title="The treasure in One Piece is at the end of the Grand Line. But isn't that the same as the beginning?" Tags="&lt;one-piece&gt;" AnswerCount="5" CommentCount="0" FavoriteCount="2" />
  <row Id="2" PostTypeId="1" AcceptedAnswerId="33" CreationDate="2012-12-11T20:39:40.780" Score="13" ViewCount="279" Body="&lt;p&gt;In the middle of &lt;em&gt;The Dark Tournament&lt;/em&gt;, Yusuke Urameshi gets to fully inherit Genkai's power of the &lt;em&gt;Spirit Wave&lt;/em&gt; by absorbing a ball of energy from her.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;However, this process turns into an excruciating trial for Yusuke, almost killing him, and keeping him doubled over in extreme pain for a long period of time, so much so that his Spirit Animal, Poo, is also in pain and flies to him to try to help.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;My question is, why is it such a painful procedure to learn and absorb this power?&lt;/p&gt;&#xA;" OwnerUserId="26" LastEditorUserId="247" LastEditDate="2013-02-26T17:02:31.570" LastActivityDate="2013-06-20T03:31:39.187" Title="Why does absorbing the Spirit Wave from Genkai involve such a painful process?" Tags="&lt;yu-yu-hakusho&gt;" AnswerCount="1" CommentCount="0" />
  <row Id="3" PostTypeId="1" AcceptedAnswerId="148" CreationDate="2012-12-11T20:42:47.447" Score="9" ViewCount="3022" Body="&lt;p&gt;In Sora no Otoshimono, Ikaros carries around a watermelon like a pet and likes watermelons and pretty much anything else round.  At one point she even has a watermelon garden and attacks all the bugs that get near the melons.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;What's the significance of the watermelon and why does she carry one around?&lt;/p&gt;&#xA;" OwnerUserId="29" LastActivityDate="2014-01-15T21:01:55.043" Title="What's the significance of the watermelon in Sora no Otoshimono?" Tags="&lt;sora-no-otoshimono&gt;" AnswerCount="2" CommentCount="1" />
In the middle of The Dark Tournament Yusuke Urameshi gets to fully inherit Genkai's power of the .... (continued)
我已经使用
ElementTree
解析了
Body
字段。接下来我要做的是解析每行的
Body
字段中的单词。为此,我需要去掉
Body
字段中的任何html标记。例如,在剥离html标记的文本后,
Id=2
Body
字段应该如下所示:

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="1" PostTypeId="1" AcceptedAnswerId="8" CreationDate="2012-12-11T20:37:08.823" Score="67" ViewCount="17934" Body="&lt;p&gt;Assuming the world in the One Piece universe is round, then there is not really a beginning or an end of the Grand Line.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;The Straw Hats started out from the first half and are now sailing across the second half.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Wouldn't it have been quicker to set sail in the opposite direction from where they started?     &lt;/p&gt;&#xA;" OwnerUserId="21" LastEditorUserId="1398" LastEditDate="2015-04-17T19:06:38.957" LastActivityDate="2015-05-26T12:50:40.920" Title="The treasure in One Piece is at the end of the Grand Line. But isn't that the same as the beginning?" Tags="&lt;one-piece&gt;" AnswerCount="5" CommentCount="0" FavoriteCount="2" />
  <row Id="2" PostTypeId="1" AcceptedAnswerId="33" CreationDate="2012-12-11T20:39:40.780" Score="13" ViewCount="279" Body="&lt;p&gt;In the middle of &lt;em&gt;The Dark Tournament&lt;/em&gt;, Yusuke Urameshi gets to fully inherit Genkai's power of the &lt;em&gt;Spirit Wave&lt;/em&gt; by absorbing a ball of energy from her.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;However, this process turns into an excruciating trial for Yusuke, almost killing him, and keeping him doubled over in extreme pain for a long period of time, so much so that his Spirit Animal, Poo, is also in pain and flies to him to try to help.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;My question is, why is it such a painful procedure to learn and absorb this power?&lt;/p&gt;&#xA;" OwnerUserId="26" LastEditorUserId="247" LastEditDate="2013-02-26T17:02:31.570" LastActivityDate="2013-06-20T03:31:39.187" Title="Why does absorbing the Spirit Wave from Genkai involve such a painful process?" Tags="&lt;yu-yu-hakusho&gt;" AnswerCount="1" CommentCount="0" />
  <row Id="3" PostTypeId="1" AcceptedAnswerId="148" CreationDate="2012-12-11T20:42:47.447" Score="9" ViewCount="3022" Body="&lt;p&gt;In Sora no Otoshimono, Ikaros carries around a watermelon like a pet and likes watermelons and pretty much anything else round.  At one point she even has a watermelon garden and attacks all the bugs that get near the melons.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;What's the significance of the watermelon and why does she carry one around?&lt;/p&gt;&#xA;" OwnerUserId="29" LastActivityDate="2014-01-15T21:01:55.043" Title="What's the significance of the watermelon in Sora no Otoshimono?" Tags="&lt;sora-no-otoshimono&gt;" AnswerCount="2" CommentCount="1" />
In the middle of The Dark Tournament Yusuke Urameshi gets to fully inherit Genkai's power of the .... (continued)
到目前为止,我所尝试的:

def remove_html_tags(text):
        return bs4.BeautifulSoup(text, "html.parser").text
这导致:

pin the middle of emthe dark tournamentem yusuke urameshi gets to fully inherit genkais power of the emspirit waveem by absorbing a ball of energy from herp
phowever this process turns into an excruciating trial for yusuke almost killing him and keeping him doubled over in extreme pain for a long period of time so much so that his spirit animal poo is also in pain and flies to him to try to helpp
pmy question is why is it such a painful procedure to learn and absorb this powerp
正如你所看到的,符号消失了,但符号中包含的文本保留了下来。如何删除它们?

请尝试以下操作:

import re
from bs4 import BeautifulSoup

xml = """
<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="1" PostTypeId="1" AcceptedAnswerId="8" CreationDate="2012-12-11T20:37:08.823" Score="67" ViewCount="17934" Body="&lt;p&gt;Assuming the world in the One Piece universe is round, then there is not really a beginning or an end of the Grand Line.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;The Straw Hats started out from the first half and are now sailing across the second half.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Wouldn't it have been quicker to set sail in the opposite direction from where they started?     &lt;/p&gt;&#xA;" OwnerUserId="21" LastEditorUserId="1398" LastEditDate="2015-04-17T19:06:38.957" LastActivityDate="2015-05-26T12:50:40.920" Title="The treasure in One Piece is at the end of the Grand Line. But isn't that the same as the beginning?" Tags="&lt;one-piece&gt;" AnswerCount="5" CommentCount="0" FavoriteCount="2" />
  <row Id="2" PostTypeId="1" AcceptedAnswerId="33" CreationDate="2012-12-11T20:39:40.780" Score="13" ViewCount="279" Body="&lt;p&gt;In the middle of &lt;em&gt;The Dark Tournament&lt;/em&gt;, Yusuke Urameshi gets to fully inherit Genkai's power of the &lt;em&gt;Spirit Wave&lt;/em&gt; by absorbing a ball of energy from her.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;However, this process turns into an excruciating trial for Yusuke, almost killing him, and keeping him doubled over in extreme pain for a long period of time, so much so that his Spirit Animal, Poo, is also in pain and flies to him to try to help.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;My question is, why is it such a painful procedure to learn and absorb this power?&lt;/p&gt;&#xA;" OwnerUserId="26" LastEditorUserId="247" LastEditDate="2013-02-26T17:02:31.570" LastActivityDate="2013-06-20T03:31:39.187" Title="Why does absorbing the Spirit Wave from Genkai involve such a painful process?" Tags="&lt;yu-yu-hakusho&gt;" AnswerCount="1" CommentCount="0" />
  <row Id="3" PostTypeId="1" AcceptedAnswerId="148" CreationDate="2012-12-11T20:42:47.447" Score="9" ViewCount="3022" Body="&lt;p&gt;In Sora no Otoshimono, Ikaros carries around a watermelon like a pet and likes watermelons and pretty much anything else round.  At one point she even has a watermelon garden and attacks all the bugs that get near the melons.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;What's the significance of the watermelon and why does she carry one around?&lt;/p&gt;&#xA;" OwnerUserId="29" LastActivityDate="2014-01-15T21:01:55.043" Title="What's the significance of the watermelon in Sora no Otoshimono?" Tags="&lt;sora-no-otoshimono&gt;" AnswerCount="2" CommentCount="1" />
  """
    
soup = BeautifulSoup(xml, "html.parser")
for tag in soup.select("posts row"):
    result = re.sub("<.*?>", "", tag["body"])
    print(result.strip())
另一种方法

from simplified_scrapy import SimplifiedDoc, utils, req
xml = '''<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="1" PostTypeId="1" AcceptedAnswerId="8" CreationDate="2012-12-11T20:37:08.823" Score="67" ViewCount="17934" Body="&lt;p&gt;Assuming the world in the One Piece universe is round, then there is not really a beginning or an end of the Grand Line.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;The Straw Hats started out from the first half and are now sailing across the second half.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Wouldn't it have been quicker to set sail in the opposite direction from where they started?     &lt;/p&gt;&#xA;" OwnerUserId="21" LastEditorUserId="1398" LastEditDate="2015-04-17T19:06:38.957" LastActivityDate="2015-05-26T12:50:40.920" Title="The treasure in One Piece is at the end of the Grand Line. But isn't that the same as the beginning?" Tags="&lt;one-piece&gt;" AnswerCount="5" CommentCount="0" FavoriteCount="2" />
  <row Id="2" PostTypeId="1" AcceptedAnswerId="33" CreationDate="2012-12-11T20:39:40.780" Score="13" ViewCount="279" Body="&lt;p&gt;In the middle of &lt;em&gt;The Dark Tournament&lt;/em&gt;, Yusuke Urameshi gets to fully inherit Genkai's power of the &lt;em&gt;Spirit Wave&lt;/em&gt; by absorbing a ball of energy from her.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;However, this process turns into an excruciating trial for Yusuke, almost killing him, and keeping him doubled over in extreme pain for a long period of time, so much so that his Spirit Animal, Poo, is also in pain and flies to him to try to help.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;My question is, why is it such a painful procedure to learn and absorb this power?&lt;/p&gt;&#xA;" OwnerUserId="26" LastEditorUserId="247" LastEditDate="2013-02-26T17:02:31.570" LastActivityDate="2013-06-20T03:31:39.187" Title="Why does absorbing the Spirit Wave from Genkai involve such a painful process?" Tags="&lt;yu-yu-hakusho&gt;" AnswerCount="1" CommentCount="0" />
  <row Id="3" PostTypeId="1" AcceptedAnswerId="148" CreationDate="2012-12-11T20:42:47.447" Score="9" ViewCount="3022" Body="&lt;p&gt;In Sora no Otoshimono, Ikaros carries around a watermelon like a pet and likes watermelons and pretty much anything else round.  At one point she even has a watermelon garden and attacks all the bugs that get near the melons.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;What's the significance of the watermelon and why does she carry one around?&lt;/p&gt;&#xA;" OwnerUserId="29" LastActivityDate="2014-01-15T21:01:55.043" Title="What's the significance of the watermelon in Sora no Otoshimono?" Tags="&lt;sora-no-otoshimono&gt;" AnswerCount="2" CommentCount="1" />
'''
doc = SimplifiedDoc(xml)
rows = doc.selects('row>Body()') 
print ([doc.removeHtml(doc.unescape(row)) for row in rows])

如果我的内容是xml文件呢?说
Anime.xml
?我是否将文件路径传递给SimplifiedDoc?@Robur_131您可以这样做:xml=utils.getFileContent('Anime.xml')doc=SimplifiedDoc(xml)
['Assuming the world in the One Piece universe is round, then there is not really a beginning or an end of the Grand Line. The Straw Hats started out from the first half and are now sailing across the second half. Wouldn', 'In the middle of The Dark Tournament, Yusuke Urameshi gets to fully inherit Genkai', 'In Sora no Otoshimono, Ikaros carries around a watermelon like a pet and likes watermelons and pretty much anything else round. At one point she even has a watermelon garden and attacks all the bugs that get near the melons. What']