Python 使用；“显示更多”；刮取数据_Python_Parsing_Button_Screen Scraping_Show

Python 使用；“显示更多”；刮取数据

python parsing button

Python 使用；“显示更多”；刮取数据,python,parsing,button,screen-scraping,show,Python,Parsing,Button,Screen Scraping,Show,我一直在尝试使用python从网页中提取数据，目前为止效果不错。但问题是页面并没有立即加载所有内容，而是有一个“显示更多”按钮。所以我的脚本只抓取了前10项。我已经看过了这个网站，但我对这个网址无能为力。我想我必须在服务器上发布一些东西才能取回下一个项目，但我不知道发布什么以及如何发布。这是我的代码： res = requests.get('https://candidat.pole-emploi.fr/offres/recherche?motsCles=serveur&offresP

我一直在尝试使用python从网页中提取数据，目前为止效果不错。但问题是页面并没有立即加载所有内容，而是有一个“显示更多”按钮。所以我的脚本只抓取了前10项。我已经看过了这个网站，但我对这个网址无能为力。我想我必须在服务器上发布一些东西才能取回下一个项目，但我不知道发布什么以及如何发布。这是我的代码：

res = requests.get('https://candidat.pole-emploi.fr/offres/recherche?motsCles=serveur&offresPartenaires=true&rayon=20&tri=0')

page_soup = bs4.BeautifulSoup(res.text,"html.parser")

containers = page_soup.findAll("div",{"class":"media-body"})
url = []
for container in containers:
    url.append('https://candidat.pole-emploi.fr' +container.h2.a["href"])



for i in url:
    print(i)
email_list = []

for adress in url:
    print( ' testing ', adress)
    found = False
    detail = requests.get(adress)
    apply = bs4.BeautifulSoup(detail.text,"html.parser")
    apply_mail = apply.findAll("div",{"class":"apply-block"})
    if apply_mail == []:
        email_list.append('not found')
        continue

    email_raw = apply_mail[0].text
    for i in email_raw.splitlines():
        if '@' in i:
            email_list.append(i)
            found = True
    if not found:
        email_list.append('not found')



for i in email_list:
    print(i)

您可以使用

Beuatifulsoup

或其他

http

请求库删除的唯一数据是在启动时可用的，而无需使用

Javascirpt

。这与执行

curl$URL

和解析数据相同

解决此问题的一种方法是使用

selenium

webdriver和编程操作与用户在浏览器上执行的操作相同

可以找到更多信息

非常感谢，我就是这么做的，我用了硒，效果很好！