Python 使用BeautifulSoup进行Web解析_Python_Html_Parsing_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup进行Web解析

python html parsing web-scraping

Python 使用BeautifulSoup进行Web解析,python,html,parsing,web-scraping,beautifulsoup,Python,Html,Parsing,Web Scraping,Beautifulsoup,我需要解析来自网站的数据：我正设法缩短发帖的时间。例如“21小时前” 网站的HTML代码。我正试图从中抽出时间您将输出为“无”，因为BeautifulSoup从初始请求获取HTML源代码，并且没有发布的时间值。这些值在初始请求之后呈现。要捕获这些类型的值，可以使用“Selenium”之类的库。当您使用selenium时，您可以等待页面加载，然后将该HTML源代码传递到BeautifulSoup并进行尝试。下面是为获得基本理解而创建的示例代码：从bs4导入美化组从selenium导入web

我需要解析来自网站的数据：

我正设法缩短发帖的时间。例如“21小时前”

网站的HTML代码。我正试图从中抽出时间

您将
输出为“无”，因为BeautifulSoup从初始请求获取HTML源代码，并且没有发布的时间值。这些值在初始请求之后呈现。要捕获这些类型的值，可以使用“Selenium”之类的库。当您使用selenium时，您可以等待页面加载，然后将该HTML源代码传递到BeautifulSoup并进行尝试。下面是为获得基本理解而创建的示例代码：从bs4导入美化组从selenium导入webdriver driver=webdriver.Chrome（）司机，上车https://finance.yahoo.com/quote/MSFT/community') html=driver.page\u源尝试： myElem=WebDriverWait（driver，10）.until（EC.presence_of_element_位于（（By.CLASS_NAME，'comment Pend（2px）Mt（5px）Mb（11px）P（12px）'）） soup=BeautifulSoup（html，'html.parser'） #代码的其余部分除TimeoutException外：打印（“加载失败”）您能提供这篇文章的确切URL吗？文章？如果你是指我正在解析的雅虎财经页面的URL，请看这里@schezfazYes，这只是一个小例子来帮助你获得一个想法。如果它对你有帮助，请考虑批准这个答案： <li class="comment Pend(2px) Mt(5px) Mb(11px) P(12px) " data-reactid="24"> <div class="Pos(r) Pstart(52px) " data-reactid="25"> <div class="Fz(12px) Mend(20px) Mb(5px)" data-reactid="26"> <div class="avatar D(ib) Bdrs(46.5%) Pos(a) Start(0) Cur(p)" data-reactid="27">...</div> <button aria-label="See reaction history for You get output as "None" because BeautifulSoup gets the HTML Source from the initial Request and it doesn't have the Posted time values. Those values are rendered after the initial request. To capture those kinds of values you can use a library like "Selenium". When you use selenium, you can wait till the page loads and then pass that HTML Source into BeautifulSoup and try out. Below is a sample code created to get a basic understanding: from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome() driver.get('https://finance.yahoo.com/quote/MSFT/community') html = driver.page_source try: myElem = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'comment Pend(2px) Mt(5px) Mb(11px) P(12px)'))) soup = BeautifulSoup(html, 'html.parser') # Rest of the code except TimeoutException: print("Load failed") import requests from bs4 import BeautifulSoup response = requests.get("https://finance.yahoo.com/quote/MSFT/community") soup = BeautifulSoup(response.content, 'html.parser') print(soup)