从网站上删除由javascript编写的文本_Javascript_Python_Web Scraping_Beautifulsoup

从网站上删除由javascript编写的文本

javascript python web-scraping

从网站上删除由javascript编写的文本,javascript,python,web-scraping,beautifulsoup,Javascript,Python,Web Scraping,Beautifulsoup,我正在使用BeautifulSoup从网站上抓取角色信息。当尝试获取角色的获胜率时，BeautifulSoup无法找到它当我检查文本时，它如下所示。我能在网站源代码中找到的所有东西，以及BeautifulSoup找到的所有东西都是“排名统计占位符” 这是我目前正在使用的代码 import bs4 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = "https://

我正在使用BeautifulSoup从网站上抓取角色信息。当尝试获取角色的获胜率时，BeautifulSoup无法找到它

当我检查文本时，它如下所示。我能在网站源代码中找到的所有东西，以及BeautifulSoup找到的所有东西都是“排名统计占位符”

这是我目前正在使用的代码

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://u.gg/lol/champions/darius/build/?role=top"

#opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

#champion name
champ_name = page_soup.findAll("span", {"class":"champion-name"})[0].text

#champion win rate
champ_wr = page.soup.findAll("div", {"class":"win-rate okay-tier"})

我相信win rate文本是通过javascript添加的，但我不知道如何获取文本。我目前拥有的代码对champ_wr返回“None”

虽然从技术上讲，这个文本可能在javascript本身中，但我的第一个猜测是JS通过ajax请求将其引入。让你的程序模拟一下，你可能会得到你需要的所有数据，而不需要任何刮擦

不过这需要一点侦探工作。我建议打开网络流量记录器（如Firefox中的“Web开发者工具栏”），然后访问该网站。将注意力集中在任何/所有XmlHTTPRequests上

祝你好运

我不确定你与BeautifulSoup的关系，但我可以让selenium做一些有用的事情：

# load code from selenium package
from selenium.webdriver import Remote
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

# start an instance of Chrome up
chrome = Service('/usr/local/bin/chromedriver')
chrome.start()
driver = Remote(chrome.service_url)

# get the page loading
driver.get("https://u.gg/lol/champions/darius/build/?role=top")

# wait for the win rate to be populated
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "win-rate")))

# get the values you wanted
name = driver.find_element_by_class_name("champion-name").text
winrate = driver.find_element_by_class_name("win-rate").text

# display them
print(f"name: {repr(name)}, winrate: {winrate.split()[0]}")

# clean up a bit
driver.quit()

我找不到任何XmlHTTPRequests，但我设法找到了我需要的所有东西都在一个.js文件中。但是我不知道如何使用它…如果你找到了你需要的东西，你能直接从.js文件解析它吗？也许可以使用正则表达式？Chrome的开发工具（Linux中的Ctrl+Shift+J）有一个“网络”选项卡，可以列出这些请求…@SamMason是的，但我在那里没有找到任何XmlHTTPRequests