Python 如何使用beautifulsoup提取电子邮件文本?
我正在尝试提取电子邮件地址文本,但很困难。我怎样才能收到电子邮件 我的代码:Python 如何使用beautifulsoup提取电子邮件文本?,python,beautifulsoup,Python,Beautifulsoup,我正在尝试提取电子邮件地址文本,但很困难。我怎样才能收到电子邮件 我的代码: #importing libraries import pandas as pd import requests from bs4 import BeautifulSoup #send request url = 'https://apps.shopify.com/loox?surface_detail=all&surface_inter_position=1&surface_intra_posi
#importing libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
#send request
url = 'https://apps.shopify.com/loox?surface_detail=all&surface_inter_position=1&surface_intra_position=1&surface_type=category'
page = requests.get(url)
#parse the html
soup = BeautifulSoup(page.text, 'html.parser')
#get email
app_email = []
email_elm = soup.find_all(class_= 'app-support-list__item')
我在努力擦洗“support@loox.io“来自此处的电子邮件:以下是该页面的来源:
</div>
<div class="grid__item grid__item--tablet-up-third grid__item--desktop-up-3 grid__item--desktop-up-push-3">
<div class="block app-listing__support-section app-listing__section">
<h3 class="block__heading heading--2">Support</h3>
<div class="block__content">
<ul class="app-support-list">
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-faq" /> </svg><a class="ui-external-link" href="http://help.loox.io" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">FAQ<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
</li>
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-website-url" /> </svg><a class="ui-external-link" href="https://loox.app" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Developer website<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
</li>
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-privacy" /> </svg><a class="ui-external-link" href="https://loox.io/legal/privacy_policy.pdf" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Privacy policy<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
</li>
<li class="app-support-list__item">
<span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-email" /> </svg>support@loox.io</span>
</li>
支持
-
-
-
-
support@loox.io
在这种特殊情况下,您可以通过以下方式锁定电子邮件:
email_elm[3].span.text.replace(" ", "")
这将获取第四个列表项,然后是跨度中的文本,然后我们删除空格。您可以使用切片和
文本来仅获取文本
而不是
email_elm = soup.find_all(class_= 'app-support-list__item')
尝试:
有具体问题吗?这里不是指南/教程的地方。请看,回答得好。.span
在您的代码中到底做了什么?@Jem我们只是针对html中的span元素,因为它是列表项的子元素。啊,所以通过在.text
之前添加.span
,它只会返回span标记中的文本?是的,不添加它会导致:\nsupport@loox.io\n
email_elm = soup.find_all(class_= "app-support-list__item")[3].text.strip()