Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用beautifulsoup从span和em标记中提取数据_Python_Html_Web Scraping_Beautifulsoup_Em - Fatal编程技术网

Python 如何使用beautifulsoup从span和em标记中提取数据

Python 如何使用beautifulsoup从span和em标记中提取数据,python,html,web-scraping,beautifulsoup,em,Python,Html,Web Scraping,Beautifulsoup,Em,我正在编写一个从网页中提取数据的代码 # first is task.py import requests from bs4 import BeautifulSoup url = ('https://www.naukri.com/job-listings-Python-Developer-Cloud-Analogy-Softech-Pvt-Ltd-Noida-Sector-63-Noida-1-to-2-years-250718003152?src=rcntSrchWithoutCount&a

我正在编写一个从网页中提取数据的代码

# first is task.py
import requests
from bs4 import BeautifulSoup

url = ('https://www.naukri.com/job-listings-Python-Developer-Cloud-Analogy-Softech-Pvt-Ltd-Noida-Sector-63-Noida-1-to-2-years-250718003152?src=rcntSrchWithoutCount&sid=15327965116011&xp=1&px=1&qp=python%20developer&srcP 
ge=s')
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all("div", {"id":"viewContact"})
for link in links:
    print(link.text)
我想检索此页面的联系人详细信息。在“查看联系人详细信息”页面的底部 该网页包含:

<div class="jDisc viewContact" id="viewContact" style="display: block;"><p> 
<em>Recruiter Name:</em><span>Malika Pathak, Himani Adhikari</span></p><p> 
<em>Contact Company:</em><span>Cloud Analogy Softech Pvt Ltd</span></p><p> 
<em>Address:</em><span>H-77, H Block, Sector 63, Noida, UP-201307NOIDA,Uttar 
Pradesh,India 201307</span></p><p><em>Email Address:</em><span><img 
title="himani.adhikari@cloudanalogy.com , malika.pathak@cloudanalogy.com" 
src="data:image/jpeg;base64,"></span></p><p><em>Website:</em><a 
target="_blank" 
rel="nofollow" href="http://cloudanalogy.com/">http://cloudanalogy.com/</a> 
</p> 
<p><em>Telephone:</em><span>9319155392</span></p></div>

招聘人员姓名:Malika Pathak,Himani Adhikari
联系公司:云模拟Softech私人有限公司
地址:北方诺伊达市诺伊达63区H座H-77,UP-201307诺伊达
印度普拉德什201307

电子邮件地址:

网站:

电话:9319155392


我在结果中什么都没有

对于第一个链接,您可以通过
recSum
div
访问信息:

import requests, re
from bs4 import BeautifulSoup
d = soup(requests.get('https://www.naukri.com/job-listings-Python-Developer-Cloud-Analogy-Softech-Pvt-Ltd-Noida-Sector-63-Noida-1-to-2-years-250718003152?src=rcntSrchWithoutCount&sid=15327965116011&xp=1&px=1&qp=python%20developer&srcP%20ge=s').text, 'html.parser')
results = [i.text for i in d.find('div', {'class':'recSum'}).find_all(re.compile('p|span'))]
print(dict(zip(['name', 'title', 'company', 'location', 'followers'], results)))
输出:

{'name': ' Malika Pathak Senior Human Resource Executive Cloud Analogy Softech Pvt Ltd Noida ', 'title': 'Senior Human Resource Executive', 'company': 'Cloud Analogy Softech Pvt Ltd', 'location': 'Noida', 'followers': '11'}

但是,对于第二个链接,您正在尝试访问受密码保护的邮件服务器。为此,您需要通过
请求向您发送帐户凭据,或者使用邮件连接客户端,例如。

对于第一个链接,您可以通过
recSum
div
访问信息:

import requests, re
from bs4 import BeautifulSoup
d = soup(requests.get('https://www.naukri.com/job-listings-Python-Developer-Cloud-Analogy-Softech-Pvt-Ltd-Noida-Sector-63-Noida-1-to-2-years-250718003152?src=rcntSrchWithoutCount&sid=15327965116011&xp=1&px=1&qp=python%20developer&srcP%20ge=s').text, 'html.parser')
results = [i.text for i in d.find('div', {'class':'recSum'}).find_all(re.compile('p|span'))]
print(dict(zip(['name', 'title', 'company', 'location', 'followers'], results)))
输出:

{'name': ' Malika Pathak Senior Human Resource Executive Cloud Analogy Softech Pvt Ltd Noida ', 'title': 'Senior Human Resource Executive', 'company': 'Cloud Analogy Softech Pvt Ltd', 'location': 'Noida', 'followers': '11'}

但是,对于第二个链接,您正在尝试访问受密码保护的邮件服务器。为此,您需要通过
请求
向您发送帐户凭据,或者使用邮件连接客户端,例如。

此处第二个代码的最后一个HTML页面缺失,但您可以在网页上查看。此处第二个代码的最后一个HTML页面缺失,但您可以在网页上查看。谢谢您的帮助!请告诉我[find|u all(re.compile('p | span'))和代码中的[i.text]有什么用谢谢您的帮助!请告诉我代码中[find|u all(re.compile('p | span'))和[i.text]的用法是什么