Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Webscrape Linkedin列表_Python_Html_Beautifulsoup_Linkedin - Fatal编程技术网

Python Webscrape Linkedin列表

Python Webscrape Linkedin列表,python,html,beautifulsoup,linkedin,Python,Html,Beautifulsoup,Linkedin,代码中的url指向LinkedIn上的许多列表 我只想获取每个列表的链接/href,但输出结果为空。我只需要每个列表的html import pandas as pd from bs4 import BeautifulSoup import csv import requests headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrom

代码中的url指向LinkedIn上的许多列表

我只想获取每个列表的链接/href,但输出结果为空。我只需要每个列表的html

import pandas as pd
from bs4 import BeautifulSoup
import csv
import requests
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'}
url='https://www.linkedin.com/jobs/search/?currentJobId=2213597199&geoId=103644278&keywords=cyber%20analyst&location=United%20States&start=25'
r=requests.get(url,headers)
soup=BeautifulSoup(r.content,'html.parser')
listing=soup.find_all('div',class_="job-card-container relative job-card-list job-card-container--clickable job-card-list--underline-title-on-hover jobs-search-results-list__list-item--active jobs-search-two-pane__job-card-container--viewport-tracking-1")
for info in listing:
    link= info.find('a',href=True)
    print(link)

正如评论中所建议的那样,您可能希望试一试
selenium

以下是如何获得所有工作机会的链接:

import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

url = "https://www.linkedin.com/jobs/search/?currentJobId=2213597199&geoId=103644278&keywords=cyber%20analyst&location=United%20States&start=0&redirect=false"
driver.get(url)
time.sleep(2)
elements = driver.find_elements_by_class_name("result-card__full-card-link")
job_links = [e.get_attribute("href") for e in elements]

for job_link in job_links:
    print(job_link)
输出:

https://www.linkedin.com/jobs/view/cyber-threat-intelligence-analyst-at-linkedin-2261917520?refId=b5cf1ce3-d032-4aaa-8810-26d4782cc34d&position=1&pageNum=0&trk=public_jobs_job-result-card_result-card_full-click
https://www.linkedin.com/jobs/view/cyber-security-analyst-at-modis-2273028250?refId=b5cf1ce3-d032-4aaa-8810-26d4782cc34d&position=2&pageNum=0&trk=public_jobs_job-result-card_result-card_full-click
https://www.linkedin.com/jobs/view/jr-python-cyber-analyst-ts-sci-at-deloitte-2265989857?refId=b5cf1ce3-d032-4aaa-8810-26d4782cc34d&position=3&pageNum=0&trk=public_jobs_job-result-card_result-card_full-click
https://www.linkedin.com/jobs/view/cyber-security-analyst-at-modis-2307968344?refId=b5cf1ce3-d032-4aaa-8810-26d4782cc34d&position=4&pageNum=0&trk=public_jobs_job-result-card_result-card_full-click
https://www.linkedin.com/jobs/view/entry-level-cyber-security-analyst-at-hcl-technologies-2271846580?refId=b5cf1ce3-d032-4aaa-8810-26d4782cc34d&position=5&pageNum=0&trk=public_jobs_job-result-card_result-card_full-click
and so on ..
你要上的课在这里:


您将无法使用
请求执行此操作,客户端渲染太多。您必须切换到无头浏览器(可能是chrome),并使用selenium。也要为被拒绝的请求做好准备-那些家伙不喜欢被刮的时候。请求没有被拒绝,这是可以做到的。是的,请求没有被拒绝,但你没有得到你想要的实际数据,你只得到html部分,关于列表本身的信息是通过JavaScript加载的。而
请求
将不会执行javascript。尝试此操作-获取文本的斑点,并尝试搜索要刮取的单词-它们不会出现。您是对的:/Hi,代码正在运行。你能附上一个类的屏幕截图(result-card\uuuu full-card-link)吗?我以前找到过这个,但即使代码运行也找不到