Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Beautifulsoup(bs4)findAll未找到所有元素_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python Beautifulsoup(bs4)findAll未找到所有元素

Python Beautifulsoup(bs4)findAll未找到所有元素,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,从代码中的url,我最终尝试从页面中收集所有玩家的名字。然而,当我使用.findAll来获取所有列表元素时,我还没有成功。请告知 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup players_url = 'https://stats.nba.com/players/list/?Historic=Y' # Opening up the Connection and grabbin

从代码中的url,我最终尝试从页面中收集所有玩家的名字。然而,当我使用.findAll来获取所有列表元素时,我还没有成功。请告知

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

players_url = 'https://stats.nba.com/players/list/?Historic=Y'

# Opening up the Connection and grabbing the page
uClient = uReq(players_url)
page_html = uClient.read()

players_soup = soup(page_html, "html.parser")

# Taking all of the elements from the unordered lists that contains all of the players.

list_elements = players_soup.findAll('li', {'class': 'players-list__name'})
正如建议的那样,最好将
selenium
BS
一起使用:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://stats.nba.com/players/list/?Historic=Y')
soup = BeautifulSoup(driver.page_source, 'lxml')
for div in soup.findAll('li', {'class': 'players-list__name'}):
    print(div.find('a').contents[0])
输出:

Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
...
等。

如建议)在评论中提到:


页面中生成的玩家列表是使用javascript完成的

我建议您不要使用Selenium,而是使用由《非常流行》一书作者创建的这个软件包。它使用引擎盖下的铬来呈现JavaScript内容

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://stats.nba.com/players/list/?Historic=Y')
r.html.render()
for anchor in r.html.find('.players-list__name > a'):
    print(anchor.text)
输出:

Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
Abdelnaby, Alaa
Abdul-Aziz, Zaid
Abdul-Jabbar, Kareem
Abdul-Rauf, Mahmoud
Abdul-Wahad, Tariq
...

您可以通过直接从提供名称的js脚本中提取请求来实现这一点

import requests
import json

r = requests.get('https://stats.nba.com/js/data/ptsd/stats_ptsd.js')
s = r.text.replace('var stats_ptsd = ','').replace('};','}')
data = json.loads(s)['data']['players']
players = [item[1] for item in data]
print(players)

找不到什么?页面中生成的玩家列表是用javascript完成的。您需要一个能够完全呈现页面的客户端。通常常用的方法是驱动浏览器访问url(您可以使用selenium),获取页面源代码,然后将其提供给beautiful soup。