Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 替换所有<;img>;XML文件中带有一个单词的标记_Python_Html_Regex_Xml_Python 2.7 - Fatal编程技术网

Python 替换所有<;img>;XML文件中带有一个单词的标记

Python 替换所有<;img>;XML文件中带有一个单词的标记,python,html,regex,xml,python-2.7,Python,Html,Regex,Xml,Python 2.7,我有一个XML文件,它由许多twit组成,其中包含html标记。 在所有其他任务中,我需要用@emoji一词替换所有标记 我编写了以下代码: for word in re.findall(r"&lt;img[\w\W]+?/&gt;",line): print word line = line.replace(word,'@emoji') 这对生产线非常有效。 但是,当我尝试在整个文件的循环中执行此操作时,它不会进入此循环。代码如下: import re imp

我有一个XML文件,它由许多twit组成,其中包含html标记。 在所有其他任务中,我需要用@emoji一词替换所有标记 我编写了以下代码:

for word in re.findall(r"&lt;img[\w\W]+?/&gt;",line):
    print word
    line = line.replace(word,'@emoji')
这对生产线非常有效。 但是,当我尝试在整个文件的循环中执行此操作时,它不会进入此循环。代码如下:

import re
import xml.etree.ElementTree as ET #xml lib
filename = 'da0d0e3527b931bb0bc6f5435003ea2a.xml'
tree = ET.parse(filename)
root = tree.getroot()
twits = []
for child in root:
   for grandchild in child:
        twits.append(grandchild.text)
for line in twits:
    for word in re.findall(r"&lt;img[\w\W]+?&gt;",line):
        line = line.replace(word,'@img')
    print line
我也尝试对html解析器执行相同的操作,但我无法将标记转换为字符串:

imgs = soup.find_all('img')
for img in imgs:
    print img
    emo = str(img)
    twit.replace(emo,'@emoji')
xml文件非常大,无法完全发布,但看起来如下所示:

<author>
    <documents>
        <document id="396228853267714048" url="https://twitter.com/ReissSudden/status/396228853267714048">Sooooo many slutty cats knocking around last night</document>
        <document id="396229373554360320" url="https://twitter.com/ReissSudden/status/396229373554360320">&lt;a href="/AndyLee666" class="twitter-atreply pretty-link js-nav" dir="ltr" data-mentioned-user-id="259958055" &gt;&lt;s&gt;@&lt;/s&gt;&lt;b&gt;AndyLee666&lt;/b&gt;&lt;/a&gt; yep, eye hurts but doesn&amp;#39;t look bad ha ha</document>
        <document id="396326071467270144" url="https://twitter.com/ReissSudden/status/396326071467270144">Time to start saving for a Skyline</document>
        <document id="396326916372054016" url="https://twitter.com/ReissSudden/status/396326916372054016">@LaurenWeale where were your halo and wings then?</document>
        <document id="396327202260017152" url="https://twitter.com/ReissSudden/status/396327202260017152">@LaurenWeale I didn&amp;#39;t see them, and besides, it&amp;#39;s not a scary costume</document>
        <document id="396327842252075008" url="https://twitter.com/ReissSudden/status/396327842252075008">@LaurenWeale ahh beat me to it &lt;img class="Emoji Emoji--forText" src="https://abs.twimg.com/emoji/v2/72x72/1f609.png" draggable="false" alt="&#128521;" title="Winking face" aria-label="Emoji: Winking face"&gt;</document>
        <document id="396328213074677763" url="https://twitter.com/ReissSudden/status/396328213074677763">The best chair ever! &lt;a href="/hashtag/halloween?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" &gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;halloween&lt;/b&gt;&lt;/a&gt; &lt;a href="/hashtag/Throne?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" &gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;Throne&lt;/b&gt;&lt;/a&gt; &lt;a href="/hashtag/Devil?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" &gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;Devil&lt;/b&gt;&lt;/a&gt; &lt;a href="/hashtag/anyforty?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" &gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;anyforty&lt;/b&gt;&lt;/a&gt; &lt;a href="/hashtag/wasted?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" &gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;wasted&lt;/b&gt;&lt;/a&gt; &lt;a href="/hashtag/king?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" &gt;&lt;s&gt;#&lt;/s&gt;&lt;b&gt;king&lt;/b&gt;&lt;/a&gt; &lt;a href="http://somelink" rel="nofollow noopener" dir="ltr" data-expanded-url="http://instagram.com/p/gLi_B9EfOp/" class="twitter-timeline-link" target="_blank" title="http://instagram.com/p/gLi_B9EfOp/" &gt;&lt;span class="tco-ellipsis"&gt;&lt;/span&gt;&lt;span class="invisible"&gt;http://&lt;/span&gt;&lt;span class="js-display-url"&gt;instagram.com/p/gLi_B9EfOp/&lt;/span&gt;&lt;span class="invisible"&gt;&lt;/span&gt;&lt;span class="tco-ellipsis"&gt;&lt;span class="invisible"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;</document>
        <document id="396328831285735424" url="https://twitter.com/ReissSudden/status/396328831285735424">@LaurenWeale sorry, that was mean</document>
        <document id="396337843909713920" url="https://twitter.com/ReissSudden/status/396337843909713920">@LaurenWeale :( don&amp;#39;t be like that</document>
        <document id="396342701568040960" url="https://twitter.com/ReissSudden/status/396342701568040960">@LaurenWeale be like that then &lt;img class="Emoji Emoji--forText" src="https://abs.twimg.com/emoji/v2/72x72/1f624.png" draggable="false" alt="&#128548;" title="Face with look of triumph" aria-label="Emoji: Face with look of triumph"&gt;</document>
        <document id="396345875360129024" url="https://twitter.com/ReissSudden/status/396345875360129024">Been a pure lazy day today &lt;img class="Emoji Emoji--forText" src="https://abs.twimg.com/emoji/v2/72x72/1f44c.png" draggable="false" alt="&#128076;" title="Ok hand sign" aria-label="Emoji: Ok hand sign"&gt;</document>
    </documents>
</author>

昨晚有很多放荡的猫在附近闲逛
a href=“/AndyLee666”class=“twitter atreply pretty link js nav”dir=“ltr”数据提及用户id=“259958055”s@/sbAndyLee666/b/a是的,眼睛会痛,但不会&#39;别难看哈哈
开始为天际线节省时间
@劳伦威尔你的光环和翅膀呢?
@LaurenWeale我不知道&#39;我看不到他们,而且,它&#39;这不是可怕的服装
@LaurenWeale ahh击败了我img class=“Emoji Emoji--forText”src=”https://abs.twimg.com/emoji/v2/72x72/1f609.png“draggable=“false”alt=“😉;”title=“眨眼脸”aria label=“表情符号:眨眼脸”
有史以来最好的椅子!a href=“/hashtag/halloween?src=hash”数据查询源=“hashtag_click”class=“twitter hashtag pretty link js nav”dir=“ltr”s#/sbholloween/b/a href=“/hashtag/bound?src=hash”数据查询源=“hashtag_click”class=“twitter hashtag pretty link js nav”dir=“ltr”s#/sbbound/b/a href=“/hashtag/Devil?src=hash”数据查询源=“hashtag\u click”class=“twitter-hashtag-pretty-link js-nav”dir=“ltr”s#/sbDevil/b/a href=“/hashtag/anyfour?src=hash”数据查询源=“hashtag\u-click”class=“twitter-hashtag-pretty-link js-nav”dir=“ltr”s#/sbanyfour/b/a href=“/hashtag/wasted?src=hash”数据查询源=“hashtag\u-click”class=“twitter-hashtag-pretty-link-js-nav”dirs#/sbwasted/b/a href=“/hashtag/king?src=hash”数据查询源=“hashtag\u点击”class=“twitter hashtag pretty link js nav”dir=“ltr”s#/sbking/b/a href=”http://somelink“rel=“nofollow noopener”dir=“ltr”数据扩展url=”http://instagram.com/p/gLi_B9EfOp/“class=“twitter时间线链接”target=“\u blank”标题="http://instagram.com/p/gLi_B9EfOp/“span class=“tco省略号”/span class=“不可见”http:///spanspan class=“js display url”instagram.com/p/gLi_B9EfOp///span class=“不可见”/span class=“tco省略号”span class=“不可见” ;/span/span/a
@LaurenWeale对不起,那太刻薄了
@劳伦威尔:(别那样
@LaurenWeale就是这样,那么img class=“Emoji Emoji--forText”src=”https://abs.twimg.com/emoji/v2/72x72/1f624.png“draggable=“false”alt=“😤;”title=“胜利之脸”aria label=“表情符号:胜利之脸”
今天真是懒洋洋的一天img class=“Emoji Emoji--forText”src=”https://abs.twimg.com/emoji/v2/72x72/1f44c.png“draggable=“false”alt=“👌;”title=“Ok手势”aria label=“表情符号:Ok手势”

谢谢您的帮助!

在解析文件之前,您可以
阅读它,并可以对数据执行
re.sub
,将img替换为@emoji,然后使用
ET.fromstring
解析它。您可以像这样实现

from re import sub
import xml.etree.ElementTree as ET #xml lib
data = 'da0d0e3527b931bb0bc6f5435003ea2a.xml'
data = re.sub(r"&lt;img[\w\W]","&lt;@emoji",open(data).read())
tree = ET.fromstring(data)

现在,数据将包含
@emoji
img
的所有位置。现在,您可以根据自己的意愿解析生成的数据。

您是否尝试过
re.sub(pattern,'@emoji',line)
了?这将使循环过时。@Boldewyn刚刚尝试过,仍然不起作用:(为什么不直接使用
sed
?@Ôrel不幸的是,我对sed一无所知。我如何使用它?