Python 修改html文件中的所有本地链接_Python_Regex_Python 3.x_Beautifulsoup

Python 修改html文件中的所有本地链接

python regex python-3.x

Python 修改html文件中的所有本地链接,python,regex,python-3.x,beautifulsoup,Python,Regex,Python 3.x,Beautifulsoup,我想更改html页面的链接，如下所示： //html <html> <head> <title>Hello</title> </head> <body> <p>this is a simple text in html file</p> <a href="https://google.com">Google</a

我想更改html页面的链接，如下所示：

//html
<html>
    <head>
        <title>Hello</title>
    </head>
    <body>
        <p>this is a simple text in html file</p>
        <a href="https://google.com">Google</a>
        <a href="/frontend/login/">Login</a>
        <a href="/something/work/">Something</a>
    </body>
 </html>



//Result
    <html>
        <head>
            <title>Hello</title>
        </head>
        <body>
            <p>this is a simple text in html file</p>
            <a href="https://google.com">Google</a>
            <a href="/more/frontend/login/part/">Login</a>
            <a href="/more/something/work/extra/">Something</a>
        </body>
     </html>

//html
你好
这是一个html文件中的简单文本
//结果
你好
这是一个html文件中的简单文本

那么，如何使用python将html更改为结果并将其保存为html呢？

好的，通过

Regex

执行此操作非常简单

使用

href=“\/（[^”]*）

作为模式，并使用

href=“\/more\/\1作为替代
请看这里：


之前的“50%尝试”（sry，我错过了你的第二部分）：
好吧，通过Regex
实现这一点非常简单
使用href=“\/（[^”]*）
作为模式，并使用href=“\/more\/\1作为替代
请看这里：


之前的“50%尝试”（sry，我错过了你的第二部分）：
如果将html文件存储为字符串（例如html
），则可以执行简单的替换：
result = html.replace('<a href="/', '<a href="/more/')

result=html.replace（'如果将html文件存储为字符串（例如html
），则可以执行简单的替换：
result = html.replace('<a href="/', '<a href="/more/')

result=html.replace（“我自己解决了这个问题。但我认为这可以帮助很多人。这就是为什么我要回答我的问题，并将其公开发布的原因。
谢谢。他的30-50%的解决方案对我的完整解决方案帮助很大
import re

regex = r"href=\"\/"

test_str = ("<html>\n"
    "    <head>\n"
    "        <title>Hello</title>\n"
    "    </head>\n"
    "    <body>\n"
    "        <p>this is a simple text in html file</p>\n"
    "        <a href=\"https://google.com\">Google</a>\n"
    "        <a href=\"/front-end/login/\">Login</a>\n"
    "        <a href=\"/something/work/\">Something</a>\n"
    "    </body>\n"
    " </html>")

subst = "href=\"/more/"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

subst2 = "\\1hello/"
regex2 = r"(href=\"/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)"
result2 = re.sub(regex2, subst2, result, 0, re.MULTILINE)

if result2:
    print (result2)

writtingtofile = open("solution.html","w")
writtingtofile.write(result2)
writtingtofile.close()

重新导入
regex=r“href=\”\/“
测试\u str=（“\n”
“\n”
“你好\n”
“\n”
“\n”
“这是html文件中的简单文本\n”
“\n”
“\n”
“\n”
“\n”
" ")
subst=“href=\”/more/“
#您可以通过更改第4个参数手动指定替换的数量
结果=re.sub（regex，subst，test_str，0，re.MULTILINE）
subs2=“\\1hello/”
regex2=r“（href=\”/（？：[a-zA-Z]|[0-9]|[$-U@.&+]|[！*\（\），]|（？：%[0-9a-fA-F][0-9a-fA-F]）+）”
result2=re.sub（regex2，subs2，result，0，re.MULTILINE）
如果结果2：
打印（结果2）
WritingToFile=open（“solution.html”、“w”）
WritingToFile.write（结果2）
writingtofile.close（）文件

输出：
我自己解决了这个问题。但我认为这可以帮助很多人。这就是为什么我要回答我的问题，并将其公开发布的原因
谢谢。他的30-50%的解决方案对我的完整解决方案帮助很大
import re

regex = r"href=\"\/"

test_str = ("<html>\n"
    "    <head>\n"
    "        <title>Hello</title>\n"
    "    </head>\n"
    "    <body>\n"
    "        <p>this is a simple text in html file</p>\n"
    "        <a href=\"https://google.com\">Google</a>\n"
    "        <a href=\"/front-end/login/\">Login</a>\n"
    "        <a href=\"/something/work/\">Something</a>\n"
    "    </body>\n"
    " </html>")

subst = "href=\"/more/"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

subst2 = "\\1hello/"
regex2 = r"(href=\"/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)"
result2 = re.sub(regex2, subst2, result, 0, re.MULTILINE)

if result2:
    print (result2)

writtingtofile = open("solution.html","w")
writtingtofile.write(result2)
writtingtofile.close()

重新导入
regex=r“href=\”\/“
测试\u str=（“\n”
“\n”
“你好\n”
“\n”
“\n”
“这是html文件中的简单文本\n”
“\n”
“\n”
“\n”
“\n”
" ")
subst=“href=\”/more/“
#您可以通过更改第4个参数手动指定替换的数量
结果=re.sub（regex，subst，test_str，0，re.MULTILINE）
subs2=“\\1hello/”
regex2=r“（href=\”/（？：[a-zA-Z]|[0-9]|[$-U@.&+]|[！*\（\），]|（？：%[0-9a-fA-F][0-9a-fA-F]）+）”
result2=re.sub（regex2，subs2，result，0，re.MULTILINE）
如果结果2：
打印（结果2）
WritingToFile=open（“solution.html”、“w”）
WritingToFile.write（结果2）
writingtofile.close（）文件

输出：
到目前为止，您尝试了什么？您可以使用轻松解析html或其他刮片库。此示例可能重复替换链接而不是修改链接。我想使用以前的链接而不是新链接添加更多内容。认为该示例使用了替换您到目前为止尝试了什么？您可以使用轻松解析html或其他刮片库。可能是此示例的不可复制副本替换链接不修改链接。我想用以前的链接添加更多内容，而不是新链接。我想用的示例替换链接不是空的。我可能包含一些以前的数据。我想用以前的值添加更多内容。链接不是空的。我可能包含一些以前的数据。我想用以前的值添加更多内容。谢谢你解决了50%。我想在第一部分添加更多内容，并在链接末尾添加更多文本，如（额外）如下所示：/more/以前存在的内容/extra/您的正则表达式href=“\/（[^”]*）可能包含无效链接。什么意思？谢谢您解答了50%。我想在第一部分添加更多内容，并在链接末尾添加更多文本，如（额外）如下所示：/more/something previous exists/extra/Your regex href=“\/（[^”]*）可能包含无效链接您的意思是什么？做得好。抱歉，我没有注意到您的第二个附加术语。如果您想简化您的解决方案，请查看此项（我也会编辑我的答案）：做得好。抱歉，我没有注意到你的第二个附加术语。如果你想简化你的解决方案，请查看此项（我也会编辑我的答案）：