For loop 用于为多个URL抓取电子邮件的For循环-BS
下面是为单个基本url抓取电子邮件的代码,我一直在绞尽脑汁寻找一个简单的“for-loop”来为url数组或将url列表(csv)读入python。任何人都可以修改代码以便它可以完成任务吗For loop 用于为多个URL抓取电子邮件的For循环-BS,for-loop,beautifulsoup,For Loop,Beautifulsoup,下面是为单个基本url抓取电子邮件的代码,我一直在绞尽脑汁寻找一个简单的“for-loop”来为url数组或将url列表(csv)读入python。任何人都可以修改代码以便它可以完成任务吗 import requests import re from bs4 import BeautifulSoup allLinks = [];mails=[] url = 'https://www.smu.edu.sg/' response = requests.get(url) soup=Beautiful
import requests
import re
from bs4 import BeautifulSoup
allLinks = [];mails=[]
url = 'https://www.smu.edu.sg/'
response = requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
links = [a.attrs.get('href') for a in soup.select('a[href]') ]
allLinks=set(links)
def findMails(soup):
for name in soup.find_all('a'):
if(name is not None):
emailText=name.text
match=bool(re.match('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$',emailText))
if('@' in emailText and match==True):
emailText=emailText.replace(" ",'').replace('\r','')
emailText=emailText.replace('\n','').replace('\t','')
if(len(mails)==0)or(emailText not in mails):
print(emailText)
mails.append(emailText)
for link in allLinks:
if(link.startswith("http") or link.startswith("www")):
r=requests.get(link)
data=r.text
soup=BeautifulSoup(data,'html.parser')
findMails(soup)
else:
newurl=url+link
r=requests.get(newurl)
data=r.text
soup=BeautifulSoup(data,'html.parser')
findMails(soup)
mails=set(mails)
if(len(mails)==0):
print("NO MAILS FOUND")
哪里有错误?我昨天没有回答吗?