Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何用最终重定向替换HTML中的所有URL?_Python_Web Scraping_Beautifulsoup_Url Rewriting_Href - Fatal编程技术网

Python 如何用最终重定向替换HTML中的所有URL?

Python 如何用最终重定向替换HTML中的所有URL?,python,web-scraping,beautifulsoup,url-rewriting,href,Python,Web Scraping,Beautifulsoup,Url Rewriting,Href,最好使用BeautifulSoup,因为我已经将其用于其他目的。但是任何Python解决方案都是好的 s = BeautifulSoup(bodyhtml, features="lxml") items = s.find_all("div", {"class": "text-block"}) # I want to replace all URLs in `items` with their final redirect. 以下是一个示例URL: https://tra

最好使用BeautifulSoup,因为我已经将其用于其他目的。但是任何Python解决方案都是好的

    s = BeautifulSoup(bodyhtml, features="lxml")
    items = s.find_all("div", {"class": "text-block"})
    # I want to replace all URLs in `items` with their final redirect.
以下是一个示例URL:

https://tracking.tldrnewsletter.com/CL0/https:%2F%2Farstechnica.com%2Finformation-technology%2F2020%2F04%2Fmeet-dark_nexus-quite-possibly-the-most-potent-iot-botnet-ever%2F/1/0100017163ab9f84-cfdbd3c3-ef8c-4b34-b2a0-f6f4b8f78359-000000/BEB0JUmMqamX4piPthkn_oJ78cjvd6UocEmGf7iO5Pk=136
以下是
项目[5]
(所有项目都相同):



任何拥有Gmail地址的人现在都可以免费使用谷歌体育场。新用户将免费获得两个月的Stadia Pro。现有的Stadia Pro用户在未来两个月内将不收费。优惠包括九场比赛。访问体育场之前需要以129美元的价格购买Google Stadia Premier Edition捆绑包。在为期两个月的试用期结束后,StadiaPro将每月花费9.99美元。用户可以随时在线取消订阅。


获取相关的
a
元素。假设前缀都相同,则将
href
属性的前缀替换为空字符串。去掉第一个/后面的任何东西。然后像这样逃走:

from bs4 import BeautifulSoup
from urllib.parse import unquote


html = """
<head>

    <body>
        <p>
            <div class="text-block"><span style="color: rgb(0, 0, 0);"><a href="https://tracking.tldrnewsletter.com/CL0/https:%2F%2Fwww.polygon.com%2F2020%2F4%2F8%2F21213551%2Fgoogle-stadia-free-pro-subscription/1/010001715e86638d-8bd389c9-f9eb-4b68-ade4-c2d706ea5ecb-000000/J3pqLEKSYUvxNOcq8090EHiTSXXHiZtRNM6JD1aQP8s=136"><span style="font-size: 14px;"><strong>Google Stadia now free to anyone with a Gmail address (2 minute read)</strong></span></a>
                <br/>
                <br/><span style='font-size: 14px; font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;'>Google Stadia is now free to anyone with a Gmail address. New users will receive two months of Stadia Pro for free. Existing Stadia Pro users won't be charged for the next two months. Nine games are included with the offer. Access to Stadia previously required purchasing a Google Stadia Premier Edition bundle for $129. Stadia Pro will cost $9.99 a month after the two-month trial period ends. Users can cancel their subscriptions online at any time.</span>
                <br/>
                </span>
                <br/>
            </div>
        </p>

        </body>
</head>
"""

s = BeautifulSoup(html, features="lxml")
for a in s.select('div.text-block a'):
        a['href'] = unquote(a['href'].replace("https://tracking.tldrnewsletter.com/CL0/", "").split('/')[0])
print(s)
从bs4导入美化组
从urllib.parse导入unquote
html=”“”



Google Stadia现在对任何拥有Gmail地址的人都是免费的。新用户将免费获得两个月的Stadia Pro。现有的Stadia Pro用户在未来两个月内将不收费。该服务包括九款游戏。访问Stadia之前需要以129美元购买Google Stadia Premier Edition捆绑包。Stadia Pro将花费9美元两个月试用期结束后的一个月。99。用户可以随时在线取消订阅。

""" s=BeautifulSoup(html,features=“lxml”) 对于s.select中的a('div.text-block a'): a['href']=unquote(a['href'].替换(“https://tracking.tldrnewsletter.com/CL0/“,”)。拆分(“/”)[0]) 印刷品
产出:

    <html><head>
</head><body>
<p>
</p><div class="text-block"><span style="color: rgb(0, 0, 0);"><a href="https://www.polygon.com/2020/4/8/21213551/google-stadia-free-pro-subscription"><span style="font-size: 14px;"><strong>Google Stadia now free to anyone with a Gmail address (2 minute read)</strong></span></a>
<br/>
<br/><span style='font-size: 14px; font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;'>Google Stadia is now free to anyone with a Gmail address. New users will receive two months of Stadia Pro for free. Existing Stadia Pro users won't be charged for the next two months. Nine games are included with the offer. Access to Stadia previously required purchasing a Google Stadia Premier Edition bundle for $129. Stadia Pro will cost $9.99 a month after the two-month trial period ends. Users can cancel their subscriptions online at any time.</span>
<br/>
</span>
<br/>
</div>
</body>
</html>




拥有Gmail地址的任何人现在都可以免费使用谷歌体育场。新用户将免费获得两个月的Stadia Pro。现有的Stadia Pro用户在未来两个月内将不收费。优惠包括九场比赛。访问体育场之前需要以129美元的价格购买Google Stadia Premier Edition捆绑包。在为期两个月的试用期结束后,StadiaPro将每月花费9.99美元。用户可以随时在线取消订阅。

    <html><head>
</head><body>
<p>
</p><div class="text-block"><span style="color: rgb(0, 0, 0);"><a href="https://www.polygon.com/2020/4/8/21213551/google-stadia-free-pro-subscription"><span style="font-size: 14px;"><strong>Google Stadia now free to anyone with a Gmail address (2 minute read)</strong></span></a>
<br/>
<br/><span style='font-size: 14px; font-family: "Helvetica Neue", Helvetica, Arial, Verdana, sans-serif;'>Google Stadia is now free to anyone with a Gmail address. New users will receive two months of Stadia Pro for free. Existing Stadia Pro users won't be charged for the next two months. Nine games are included with the offer. Access to Stadia previously required purchasing a Google Stadia Premier Edition bundle for $129. Stadia Pro will cost $9.99 a month after the two-month trial period ends. Users can cancel their subscriptions online at any time.</span>
<br/>
</span>
<br/>
</div>
</body>
</html>