如何通过Python Selenius BeautifulSoup从网站中以文本形式提取安全性的价格_Python_Selenium_Web Scraping_Beautifulsoup_Webdriverwait

如何通过Python Selenius BeautifulSoup从网站中以文本形式提取安全性的价格

python selenium web-scraping

如何通过Python Selenius BeautifulSoup从网站中以文本形式提取安全性的价格,python,selenium,web-scraping,beautifulsoup,webdriverwait,Python,Selenium,Web Scraping,Beautifulsoup,Webdriverwait,我只是想简单地得到所示的证券价格。我运行以下代码： from selenium import webdriver from bs4 import BeautifulSoup driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe') driver.get('https://investor.vanguard.com/529-plan/pro

我只是想简单地得到所示的证券价格。我运行以下代码：

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

当我在selenium打开的Firefox中“检查元素”价格时，我清楚地看到：

<span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding arrange">$42.91</span >

我完全被难住了。如果有人能给我指出正确的方向，我会非常感激。我感觉我完全遗漏了一些东西，可能有几件…

您使用

数据*

属性和值来选择范围的方式没有任何错误。事实上，这是中提到的正确方法。有4个span标记匹配所有属性

find_all

将返回所有这些标记。第二个对应于价格

您遗漏的是，加载跨度需要一些时间，在此之前返回页面源代码。您可以为该跨距搜索，然后获取页面源。这里我使用Xpath来等待元素。您可以通过进入

inspect工具->右键单击元素->复制->复制xpath来获取xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

输出

[<span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Unit price as of 02/15/2019</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">$42.91</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Change</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer"><span class="number-positive">$0.47</span> <span class="number-positive">1.11%</span></span>]
$42.91

[截至2019年2月15日的单价为42.91美元，变动为0.47美元1.11%]
$42.91

仅硒就足以提取所需文本。您需要为位于的元素的可见性引入WebDriverWait，您可以使用以下解决方案：

代码块：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='ng-scope']//td[@class='ng-scope right']//span[@class='ng-scope ng-binding arrange' and @data-ng-bind-html]"))).get_attribute("innerHTML"))

控制台输出：
```
$42.91
```

可以通过

数据集访问数据-*
值

抱歉，但我不明白这意味着什么。我相信这只是我不知道自己在做什么的又一个迹象！但谢谢。不是真的，只是以

data-

开头的属性可以通过

dataset[]

访问。例如，

可以通过

document.querySelector（'input#ease'）.getAttribute（'dataset'）[value]

$42.91