核电站+；Python：将搜索字符串之间的文本移动到另一个位置（页脚）_Python_Regex_Notepad++

核电站+；Python：将搜索字符串之间的文本移动到另一个位置（页脚）

python regex notepad++

核电站+；Python：将搜索字符串之间的文本移动到另一个位置（页脚）,python,regex,notepad++,Python,Regex,Notepad++,我正在处理一些html文档，如下所示： <html> <head>Something in here</head> <body> <MYTAG>This should be moved to the Footer</MYTAG> <MYTAG>This should be moved to the Footer, too</MYTAG> <

我正在处理一些html文档，如下所示：

<html>
    <head>Something in here</head>
    <body>
        <MYTAG>This should be moved to the Footer</MYTAG>
        <MYTAG>This should be moved to the Footer, too</MYTAG>
    </body>
    <footer></footer>
</html>


这里有东西
应将其移动到页脚
这也应该移到页脚

我已经在使用Notepad++和Python来定制文档的其余部分，主要是使用正则表达式。现在，我想将标有

标记的部分移动到页脚，并在末尾添加如下文档：

<html>
    <head>Something in here</head>
    <body>
    </body>
    <footer>
        <MYTAG>This should be moved to the Footer</MYTAG>
        <MYTAG>This should be moved to the Footer, too</MYTAG>
    </footer>
</html>


这里有东西
应将其移动到页脚
这也应该移到页脚

首先，我尝试单独使用正则表达式来完成这项工作：搜索：

（试试这个
(<html.*?)((?:\s*<MYTAG>[^<]+<\/MYTAG>\n*)+)(.*?( *)<footer>)(.*?)(<\/footer>.*?<\/html>)


输入

这里有东西
应将其移动到页脚
这也应该移到页脚

输出
<html>
    <head>Something in here</head>
    <body>    </body>
    <footer>
        <MYTAG>This should be moved to the Footer</MYTAG>
        <MYTAG>This should be moved to the Footer, too</MYTAG>
    </footer>
</html>


这里有东西
应将其移动到页脚
这也应该移到页脚
试试这个
(<html.*?)((?:\s*<MYTAG>[^<]+<\/MYTAG>\n*)+)(.*?( *)<footer>)(.*?)(<\/footer>.*?<\/html>)


输入

这里有东西
应将其移动到页脚
这也应该移到页脚

输出
<html>
    <head>Something in here</head>
    <body>    </body>
    <footer>
        <MYTAG>This should be moved to the Footer</MYTAG>
        <MYTAG>This should be moved to the Footer, too</MYTAG>
    </footer>
</html>


这里有东西
应将其移动到页脚
这也应该移到页脚
您可以使用以下脚本：
import re

with open("temp.html") as html_file:
    html = html_file.read()

tags = re.findall(r"<MYTAG>.*</MYTAG>\n*", html)

html = re.sub(r"<MYTAG>.*</MYTAG>\n*", "", html)

footer = re.split(r"<footer>", html)

tags.insert(0, "<footer>\n")
tags.insert(0, footer[0])
tags.append(footer[1])


with open("temp.html", "w") as html_file:
    html_file.write("".join(tags))

重新导入
打开（“temp.html”）作为html\u文件：
html=html\u file.read（）
tags=re.findall（r“*\n*”，html）
html=re.sub（r“*\n*”，“”，html）
页脚=重新拆分（r“”，html）
标记。插入（0，“\n”）
标记。插入（0，页脚[0]）
tags.append（页脚[1]）
打开（“temp.html”、“w”）作为html\u文件：
html_文件.write（“.join（标记））

它的工作方式如下：
读取文件
查找所有标记
替换主体中的标记
将文件内容拆分为两部分
在文本中添加标记和
将结果写入文件
您可以使用以下脚本：
import re

with open("temp.html") as html_file:
    html = html_file.read()

tags = re.findall(r"<MYTAG>.*</MYTAG>\n*", html)

html = re.sub(r"<MYTAG>.*</MYTAG>\n*", "", html)

footer = re.split(r"<footer>", html)

tags.insert(0, "<footer>\n")
tags.insert(0, footer[0])
tags.append(footer[1])


with open("temp.html", "w") as html_file:
    html_file.write("".join(tags))

重新导入
打开（“temp.html”）作为html\u文件：
html=html\u file.read（）
tags=re.findall（r“*\n*”，html）
html=re.sub（r“*\n*”，“”，html）
页脚=重新拆分（r“”，html）
标记。插入（0，“\n”）
标记。插入（0，页脚[0]）
tags.append（页脚[1]）
打开（“temp.html”、“w”）作为html\u文件：
html_文件.write（“.join（标记））

它的工作方式如下：
读取文件
查找所有标记
替换主体中的标记
将文件内容拆分为两部分
在文本中添加标记和
将结果写入文件
不需要正则表达式，你可以使用BeautifulSoup，我想它会快得多。同意。BeautifulSoup它。简单得多，快得多。这看起来很有希望。我看了一下。不需要正则表达式，你可以使用BeautifulSoup，我想它会快得多。同意。BeautifulSoup它。简单得多，快得多。这看起来很有希望ing.I查看了一下。运行此命令会出现以下错误：IOError:[Errno 2]没有这样的文件或目录：“temp.html”
@M.减号当然不存在，好吧。您必须键入文件的路径，而不是“temp.html”。以防万一关于的文档。我在错误的位置运行了它。工作正常。谢谢。运行此操作会产生以下错误：IOError:[Errno 2]没有这样的文件或目录：'temp.html'
@M.Minus当然不存在，好吧。您必须键入文件的路径，而不是“temp.html”。以防万一，关于的文档使用了ah。我在错误的地方运行了它。工作正常。谢谢。谢谢，这工作正常，但在htmls中失败，两个
-字符串之间有更多内容。这是否仍然可以用正则表达式和类似的东西解决？这应该移到页脚Combobreaking内容在这里。这也应该移到页脚
谢谢，这很好，但是在htmls中失败，两个
-字符串之间有更多的内容。这是否仍然可以用正则表达式和类似于的东西来解决？这应该移到页脚Combobreaking此处的内容。此内容也应移至页脚