Python 我可以通过beautifulsoup和requests库单击几个URL并使用HTML标记获取文本吗?
关于在python for循环中使用beautifulsoup和请求从多个页面中获取数据,我有一个问题。基本上,我正试图从一个我不能使用API的测试中获得一个职位、摘要、链接和描述的列表 以下是链接: 这是我正在努力清理的网站的一部分,它存储了所有非赞助商的搜索结果Python 我可以通过beautifulsoup和requests库单击几个URL并使用HTML标记获取文本吗?,python,for-loop,beautifulsoup,python-requests,Python,For Loop,Beautifulsoup,Python Requests,关于在python for循环中使用beautifulsoup和请求从多个页面中获取数据,我有一个问题。基本上,我正试图从一个我不能使用API的测试中获得一个职位、摘要、链接和描述的列表 以下是链接: 这是我正在努力清理的网站的一部分,它存储了所有非赞助商的搜索结果 <div class="jobsearch-SerpJobCard row result clickcard" id="p_a7f43b014b2d324d" data-jk="a7f43b014b2d324d" data
<div class="jobsearch-SerpJobCard row result clickcard"
id="p_a7f43b014b2d324d" data-jk="a7f43b014b2d324d" data-tn-
component="organicJob" data-tu="">
<h2 id="jl_a7f43b014b2d324d" class="jobtitle">
<a href="/rc/clk?
jk=a7f43b014b2d324d&fccid=deadcc7ca64ae08b&vjs=3"
target="_blank" rel="noopener nofollow" onmousedown="return
rclk(this,jobmap[4],0);" onclick="setRefineByCookie([]); return
rclk(this,jobmap[4],true,0);" title="Data Scientist - Mumbai"
class="turnstileLink" data-tn-element="jobTitle"><b>Data</b>
<b>Scientist</b> - Mumbai</a>
- <span class="new">new</span></h2>
现在,我得到了所有我需要的元素,除了链接
我的问题是:
a。在相同的for循环或不同的方式中,我如何也获取每个职务的链接
b。如何使用请求单击每个链接,并获取工作摘要文本?它存储在class=jobsearch作业组件描述icl-u-xs-mt-md中
关于这些方面的任何帮助都将是惊人的,我对这一点非常陌生。谢谢大家!
编辑:
编辑2-我得到的回溯错误:
回溯最近一次呼叫上次:
文件/Users/saharsh/Desktop/Kaggle Competition/Data_Science.ipynb,第42行,在
source=requests.getr['link']
get中的文件/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py,第72行
返回请求'get',url,params=params,**kwargs
文件/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py,请求中的第58行
返回会话。requestmethod=method,url=url,**kwargs
文件/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py,第498行,在request中
prep=self.prepare\u requestreq
文件/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py,第441行,在prepare_request中
hooks=merge_hooksrequest.hooks,self.hooks,
文件/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/models.py,第309行,在prepare中
self.prepare_urlurl,参数
文件/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/models.py,第383行,在prepare_url中
提出遗漏方案错误
requests.exceptions.MissingSchema:无效URL:未提供架构。也许您的意思是http://?,所以有几个人认为您的代码段不起作用。 首先,要获得链接,必须将BeautifulSoup的焦点放在a标记上 那么,你不需要所有这些,除了尝试 下面是一段代码,包括获取摘要文本的第二个调用:
import requests
from bs4 import BeautifulSoup
pages = [10, 20, 30, 40, 50]
for page in pages:
source = requests.get('https://www.indeed.co.in/jobs?q=data+scientist&start='.format()).text
soup = BeautifulSoup(source, 'lxml')
results = []
for jobs in soup.findAll(class_='result'):
result = {
'job_title': '',
'company': '',
'summary': '',
'link': '',
'summary_text': ''
}
job_title = jobs.find('h2', {'class': 'jobtitle'})
if job_title and job_title.find('a'):
result['job_title'] = job_title.find('a').get('title')
result['link'] = "https://www.indeed.co.in{0}".format(job_title.find('a').get('href'))
#else:
# print("no job title for ", jobs)
company_span = jobs.find('span', {'class': 'company'})
if company_span:
result['company'] = company_span.get_text()
summary = jobs.find('span', class_='summary')
if summary:
result['summary'] = summary.get_text()
results.append(result)
for r in results:
#print(r['link'])
source = requests.get(r['link'])
soup = BeautifulSoup(source.text, 'lxml')
description = soup.find('div', {'class' : 'jobsearch-JobComponent-description'})
if description:
r['summary_text'] = description.get_text()
print(results)
输出:
使用“a.旋转栅门”选择链接。然后为每个链接提交一个新请求以获取作业摘要。您的url中有一个错误,您忘记了?之间https://www.indeed.co.in/jobs q=data…嘿,我试过你的代码,我得到的输出没有给我一个职位,但是打印出了很多实际的HTML,就像这样:没有@Harshmallo的职位,你应该注释不同的内容,打印并打印结果列表。看我的编辑这真的很有帮助。我试着运行它,但我得到了一个回溯错误——我用整个错误更新了原始问题。这很奇怪,因为当我在source=requests.getr['link']之前运行print命令时,它工作得很好。但是在source=requests.getr['link']上,它说有一个回溯,并建议添加一个https/。也许你可以看看这个:我的答案是否足够好,可以接受?
import requests
from bs4 import BeautifulSoup
pages = [10, 20, 30, 40, 50]
for page in pages:
source = requests.get('https://www.indeed.co.in/jobs?q=data+scientist&start='.format()).text
soup = BeautifulSoup(source, 'lxml')
results = []
for jobs in soup.findAll(class_='result'):
result = {
'job_title': '',
'company': '',
'summary': '',
'link': '',
'summary_text': ''
}
job_title = jobs.find('h2', {'class': 'jobtitle'})
if job_title and job_title.find('a'):
result['job_title'] = job_title.find('a').get('title')
result['link'] = "https://www.indeed.co.in{0}".format(job_title.find('a').get('href'))
#else:
# print("no job title for ", jobs)
company_span = jobs.find('span', {'class': 'company'})
if company_span:
result['company'] = company_span.get_text()
summary = jobs.find('span', class_='summary')
if summary:
result['summary'] = summary.get_text()
results.append(result)
for r in results:
#print(r['link'])
source = requests.get(r['link'])
soup = BeautifulSoup(source.text, 'lxml')
description = soup.find('div', {'class' : 'jobsearch-JobComponent-description'})
if description:
r['summary_text'] = description.get_text()
print(results)
[{'company': '\n DataMetica',
'job_title': 'Big-Data, Analytics Opportunities - Tech Savvy Talented '
'Freshers',
'link': 'https://www.indeed.co.in/rc/clk?jk=72e59a4376e3c7f1&fccid=f753310165e7a862&vjs=3',
'summary': '\n'
' Datametica supports the fresh minds to engage with '
'evolving tools and technologies working on Big data, Data '
'Science, Information Analytics and related...',
'summary_text': 'Pune, MaharashtraFresherJob Description\n'
'\n'
'Experience - 0 to 1 Years\n'
'\n'
'Selected candidates would get training and opportunity to '
'work on live projects in Big-Data, Analytics & Data '
'Science\n'
'\n'
'Candidates from Top Ranked Colleges or Premier Institutes '
'like IIT, NIT, REC, IIIT are preferred.\n'
'\n'
'Do you have knowledge on RDBMS Systems like Oracle, MY SQL, '
'Teradata and experience in solving analytical problems? Did '
'you use Java, C and C++ for your projects?\n'
'\n'
'If yes, then just apply with us.\n'
'\n'
'Datametica supports the fresh minds to engage with evolving '
'tools and technologies working on Big data, Data Science, '
'Information Analytics and related technologies like Hadoop, '
'Java, NoSQL.\n'
'\n'
'Added Advantage if you possess:\n'
'B.E/ B. Tech in Computer Science (graduated in 2016 & '
'2017)\n'
'Minimum 60% in Graduation\n'
'Good Communication Skills\n'
'0 to 1 Year experience'},
...
...
{'company': '\n\n Barclays',
'job_title': 'Junior Data Scientist',
'link': 'https://www.indeed.co.in/rc/clk?jk=2473a92840979437&fccid=057abf3fd357e717&vjs=3',
'summary': '\n'
' Junior Data Scientist. Junior Data Scientist - '
'90227028. Experience with the Python Data Science/Machine '
'learning stack....',
'summary_text': 'Pune, MaharashtraJunior Data Scientist - 90227028\n'
'Primary Location:IN-Maharashtra-Pune\n'
'Job Type:Permanent/Regular\n'
'Posting Range:3 Apr 2019 - 11 Apr 2019\n'
'Description\n'
'\n'
'Job Title: Junior Data Scientist\n'
'Location: Pune\n'
'\n'
'The Technology Chief Data Office exists to support and '
'enhance Barclays’ Technology function by leveraging its '
'most important asset: data. Within this, the mission '
'statement of the Data Science team is to enable Barclays to '
'react to things before they happen: to drive predictive '
'decision making by leveraging data on Technology, People, '
'and Process. We employ machine learning and artificial '
'intelligence models to discover the hidden patterns in the '
'data which describes Barclays, and use these to make '
'measured predictions. By understanding the rules which '
'govern the future evolution of any given resource, we can '
'make the right decisions in the present, driving matters '
'towards the business’ desired end goals.\n'
'\n'
'What will you be doing?\n'
'Develop machine learning and artificial intelligence '
'solutions as part of the project roadmap of the team\n'
'Support the team in balancing strategic project work with '
'incoming needs for data-driven methods.\n'
'Be agile, quick-thinking, and practical.\n'
'Evangelise for solving problems through Data across the '
'bank – contribute to the presence of our team in horizontal '
'bank-wide forums.\n'
'Contribute a creative and analytical/technical viewpoint of '
'problems\n'
'Support the team in supplying stakeholders with whatever '
'supplementary material they may require in order to get our '
'output into large-scale production.\n'
'Apply technical and analytical expertise to exploring and '
'examining data with the goal of discovering patterns and '
'previously hidden insights, which in turn can provide a '
'competitive advantage or address a pressing business '
'problem.\n'
'Implement model output within infrastructure, business '
'tools and workflow processes: turn data into something that '
'drives action within the business.\n'
'Leverage knowledge of mathematical and statistical '
'concepts, to bridge the gap between technologists and '
'mathematicians, ensuring software solutions meet business '
'goals.\n'
'What we’re looking for:\n'
'Experience solving real-world problems and creating value '
'through the end-to-end, productionised application of Data '
'Science, Machine Learning, and Artificial Intelligence '
'methods.\n'
'Experience with the Python Data Science/Machine learning '
'stack.\n'
'Master’s level degree in Science, Technology, Engineering, '
'Mathematics, or other relevant field, and associated '
'mathematical/analytical skills\n'
'Excellent interpersonal, written and verbal communication '
'skills is a must\n'
'Good presentation skills with ability to explain '
'sophisticated solution in layman terms\n'
'Skills that will help you in the role:\n'
'Experience using cloud solutions such as AWS/GCP\n'
'Experience using parallelised data storage and computation '
'solutions such as Hadoop\n'
'Experience with TensorFlow, neural networks, xgboost, nltk\n'
'Where will you be working?\n'
'PuneBarclays recently announced the creation of a new '
'world-class campus at Gera Commerzone located in Kharadi. '
'All Pune based roles will eventually start to move to this '
'new campus starting September 2019. In the run up to that, '
'during the course of 2018, there may be transitory '
'movements of some roles to other temporary sites. Please '
'speak with your recruiter about the specific location plans '
'for your role.\n'
'\n'
'For further information on EVP, please click on the link '
'below\n'
'https://now.barclays.com/WCP/content/intranet/en/functions/operations-and-technology/global-service-centre/EVP.html\n'
'\n'
'Be More at Barclays\n'
'At Barclays, each day is about being more – as a '
'professional, and as a person. ‘Be More @ Barclays’ '
'represents our core promise to all current and future '
'employees. It’s the characteristic that we want to be '
'associated with as an employer, and at the heart of every '
'employee experience. We empower our colleagues to Be More '
'Globally Connected, working on international projects that '
'improve the way millions of customers handle their '
'finances. Be More Inspired by working alongside the most '
'talented people in the industry, and delivering imaginative '
'new solutions that are redefining the future of finance. Be '
'More Impactful by having the opportunity to work on '
'cutting-edge projects, and Be More Valued for who you are.\n'
'\n'
'Interested and want to know more about Barclays? Visit '
'home.barclays/who-we-are/ for more details.\n'
'\n'
'Our Values\n'
'Everything we do is shaped by the five values of Respect, '
'Integrity, Service, Excellence and Stewardship. Our values '
'inform the foundations of our relationships with customers '
'and clients, but they also shape how we measure and reward '
'the performance of our colleagues. Simply put, success is '
'not just about what you achieve, but about how you achieve '
'it.\n'
'\n'
'Our Diversity\n'
'We aim to foster a culture where individuals of all '
'backgrounds feel confident in bringing their whole selves '
'to work, feel included and their talents are nurtured, '
'empowering them to contribute fully to our vision and '
'goals.\n'
'\n'
'Our Benefits\n'
'Our customers are unique. The same goes for our colleagues. '
"That's why at Barclays we offer a range of benefits, "
'allowing every colleague to choose the best options for '
'their personal circumstances. These include a competitive '
'salary and pension, health care and all the tools, '
'technology and support to help you become the very best you '
'can be. We are proud of our dynamic working options for '
'colleagues. If you have a need for flexibility, then please '
'discuss this with us.'}]