Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用beautifulsoup提取电子邮件文本?_Python_Beautifulsoup - Fatal编程技术网

Python 如何使用beautifulsoup提取电子邮件文本?

Python 如何使用beautifulsoup提取电子邮件文本?,python,beautifulsoup,Python,Beautifulsoup,我正在尝试提取电子邮件地址文本,但很困难。我怎样才能收到电子邮件 我的代码: #importing libraries import pandas as pd import requests from bs4 import BeautifulSoup #send request url = 'https://apps.shopify.com/loox?surface_detail=all&surface_inter_position=1&surface_intra_posi

我正在尝试提取电子邮件地址文本,但很困难。我怎样才能收到电子邮件

我的代码:

#importing libraries

import pandas as pd
import requests
from bs4 import BeautifulSoup

#send request

url = 'https://apps.shopify.com/loox?surface_detail=all&surface_inter_position=1&surface_intra_position=1&surface_type=category'
page = requests.get(url)

#parse the html

soup = BeautifulSoup(page.text, 'html.parser')

#get email

app_email = []

email_elm = soup.find_all(class_= 'app-support-list__item')
我在努力擦洗“support@loox.io“来自此处的电子邮件:以下是该页面的来源:

</div>
    <div class="grid__item grid__item--tablet-up-third grid__item--desktop-up-3 grid__item--desktop-up-push-3">
      
<div class="block app-listing__support-section app-listing__section">
  <h3 class="block__heading heading--2">Support</h3>
  <div class="block__content">
    <ul class="app-support-list">
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-faq" /> </svg><a class="ui-external-link" href="http://help.loox.io" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">FAQ<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
        </li>
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-website-url" /> </svg><a class="ui-external-link" href="https://loox.app" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Developer website<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
        </li>
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-privacy" /> </svg><a class="ui-external-link" href="https://loox.io/legal/privacy_policy.pdf" rel="nofollow noopener" target="_blank" aria-describedby="aria-new-window-desc">Privacy policy<svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#polaris-external" /> </svg></a></span>
        </li>
        <li class="app-support-list__item">
          <span><svg class="icon" aria-hidden="true" focusable="false"> <use xlink:href="#v2-icons-icon-email" /> </svg>support@loox.io</span>
        </li>

支持
  • support@loox.io

在这种特殊情况下,您可以通过以下方式锁定电子邮件:

email_elm[3].span.text.replace(" ", "")

这将获取第四个列表项,然后是跨度中的文本,然后我们删除空格。

您可以使用切片和
文本来仅获取文本

而不是

email_elm = soup.find_all(class_= 'app-support-list__item')
尝试:


有具体问题吗?这里不是指南/教程的地方。请看,回答得好。
.span
在您的代码中到底做了什么?@Jem我们只是针对html中的span元素,因为它是列表项的子元素。啊,所以通过在
.text
之前添加
.span
,它只会返回span标记中的文本?是的,不添加它会导致:
\nsupport@loox.io\n
email_elm = soup.find_all(class_= "app-support-list__item")[3].text.strip()