Python 如何从我收到的电子邮件中的超链接中提取URL？_Python_Html_Python 3.x_Beautifulsoup_Google Api

Python 如何从我收到的电子邮件中的超链接中提取URL？

python html python-3.x google-api

Python 如何从我收到的电子邮件中的超链接中提取URL？,python,html,python-3.x,beautifulsoup,google-api,Python,Html,Python 3.x,Beautifulsoup,Google Api,我正在尝试使用beautifulsoup从我的电子邮件中提取URL。当我使用GoogleAPI从get请求返回原始HTML时，我得到的就是这些（我已经删除了敏感信息，并将其替换为a和1）。在中间，HRFF＝3D“后面跟着URL是我需要的URL。它覆盖了2行，但是当我复制粘贴时（删除=），它是正确的URL。 <html><head></head><body><div class=3D"ydp20dc8582yahoo-style-wrap" s

我正在尝试使用beautifulsoup从我的电子邮件中提取URL。当我使用GoogleAPI从get请求返回原始HTML时，我得到的就是这些（我已经删除了敏感信息，并将其替换为a和1）。在中间，HRFF＝3D“后面跟着URL是我需要的URL。它覆盖了2行，但是当我复制粘贴时（删除=），它是正确的URL。

<html><head></head><body><div class=3D"ydp20dc8582yahoo-style-wrap" style=
=3D"font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px=
;"><div></div>
        <div><br></div><div><br></div>
       =20
        </div><div id=3D"ydp475be88byahoo_quoted_8442876516" class=3D"ydp47=
5be88byahoo_quoted">
            <div style=3D"font-family:'Helvetica Neue', Helvetica, Arial, s=
ans-serif;font-size:13px;color:#26882a;">
                <div>----- Forwarded Message -----</div>
                <div><b>From:</b> auto-confirm@aaaaaaaaaaaaaaaaaaaaaaa.com =
&lt;auto-confirm@aaaaaaaaaaaaaaaaaaaaaaa.com&gt;</div><div><b>To:</b> "aaaa=
aaaa@yahoo.com" &lt;aaaaaaaa@yahoo.com&gt;</div><div><b>Sent:</b> Thursday,=
 April 23, 2020, 1:39:28 PM CDT</div><div><b>Subject:</b> You chose a Virtu=
aaaaaaaaaaaa!</div><div><br></div>
                <div><div id=3D"ydp475be88byiv6890824975"><div><p> Hello aa=
aaaaaaaaaa, </p><p> Thanks for visiting <a href=3D"https://www.aaaaaaaaaaaa=
aaaaaaaaaaa.com/token/111111111aaaaa11111aaaa111111111" rel=3D"nofollow" ta=
rget=3D"_blank">https://www.aaaaaaaaaaaaaaaaaaaaaaa.com</a>. You recently s=
elected a aaaaaaaaaaaaaaaaaaaaaaaaaaaa. </p><p><a href=3D"https://www.aaaaa=
aaaaaaaaaaaaaaaaaa.com/token/111111111aaaaa11111aaaa111111111" rel=3D"nofol=
low" target=3D"_blank">Click here</a> to aaaaaaaaaaaaaaaaaaaaaaaa details, =
spend history and more. <br>Enjoy aaaaaaaaa!</p><p> https://www.aaaaaaaaaaa=
aaaaaaaaaaaa.com </p><p>Digital token: 1111-111111-1111</p><hr><p>Please do=
n=E2=80=99t reply to this email. If you have questions, please <a href=3D"h=
ttps://www.aaaaaaaaaaaaaaaaaaaaaaaaa.com/ContactUs" rel=3D"nofollow" target=
=3D"_blank"> click here. </a></p></div></div></div>
            </div>
        </div></body></html>

关于如何更改我的google API请求或beautifulsoup请求的任何帮助都将非常有用。提前感谢

编辑：我按照FeDelalEdNIN的建议做了，这里是输出。它仍然把URL分解成2行，中间有一个A=。

  soup = BeautifulSoup(content)
[<a href="https://www.aaaaaaaaaaaaaaa=
aaaaaaaaaaaaa.com/token/aaaaaaa111111111aaaaaaaaaa11111111" rel="nofollow" 
ta='rget="_blank"'>https://www.aaaaaaaaaaaaaaaaaaaaaaaaaa.com</a>, <a 
href="https://www.aaaaa=    
iddigitalsolutions.com/token/aaaaaaa111111111aaaaaaa1111111" rel="nofol= 
low" target="_blank">Click here</a>, <a href="h=
ttps://www.aaaaaaaaaaaaaaaaaaaaaaaaa.com/ContactUs" rel="nofollow" 
target='="_blank"'> click here. </a>]

soup=BeautifulSoup（内容）
[, ]

您可以先清理内容，然后再放入bs

   content = google_api.get_email()
   content = content.replace("=3D", "=")

   soup = BeautifulSoup(content)
   all_as = soup.find_all("a")

我已将您的代码添加到我的脚本中，并将结果放在我的原始帖子中进行编辑：。它仍在将URL拆分为多行，其中包含a='s。对于找到的每个a，您应该能够通过执行

a.attrs.get（“href”）

，来获取href。一旦找到了，您可以将“=\n”替换为““或者类似于清洁绳子的东西。

  soup = BeautifulSoup(content)
[<a href="https://www.aaaaaaaaaaaaaaa=
aaaaaaaaaaaaa.com/token/aaaaaaa111111111aaaaaaaaaa11111111" rel="nofollow" 
ta='rget="_blank"'>https://www.aaaaaaaaaaaaaaaaaaaaaaaaaa.com</a>, <a 
href="https://www.aaaaa=    
iddigitalsolutions.com/token/aaaaaaa111111111aaaaaaa1111111" rel="nofol= 
low" target="_blank">Click here</a>, <a href="h=
ttps://www.aaaaaaaaaaaaaaaaaaaaaaaaa.com/ContactUs" rel="nofollow" 
target='="_blank"'> click here. </a>]

   content = google_api.get_email()
   content = content.replace("=3D", "=")

   soup = BeautifulSoup(content)
   all_as = soup.find_all("a")