Python 使用Beautiful Soup从Yahoo Finance中刮取标准差_Python_Web Scraping_Beautifulsoup

Python 使用Beautiful Soup从Yahoo Finance中刮取标准差

python web-scraping

Python 使用Beautiful Soup从Yahoo Finance中刮取标准差,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图使用BeautifulSoup和Python 2.7从雅虎财经网页上的风险统计表中获取一些数字：到目前为止，我已经使用以下方法查看了html：我的问题实际上是使用soup.find获取数字。例如，标准偏差： # std should be 13.44 stdevValue = float(soup.find("span",{"data-reactid":"124","class":"W(39%) Fl(start)"}).text) # std of ca

我正试图使用BeautifulSoup和Python 2.7从雅虎财经网页上的风险统计表中获取一些数字：

到目前为止，我已经使用以下方法查看了html：

我的问题实际上是使用soup.find获取数字。例如，标准偏差：

    # std should be 13.44
    stdevValue = float(soup.find("span",{"data-reactid":"124","class":"W(39%) Fl(start)"}).text)
    # std of category should be 0.18
    stdevCat = float(soup.find("span",{"data-reactid":"125","class":"W(57%) Mend(5px) Fl(end)"}).text)

这两个对soup.find的调用都返回none。我错过了什么

from bs4 import BeautifulSoup, Comment
import urllib


riskURL = "https://finance.yahoo.com/quote/SHSAX/risk"
page = urllib.request.urlopen(riskURL)
content = page.read().decode('utf-8')
soup = BeautifulSoup(content, 'html.parser')
#W(25%) Fl(start) Ta(e)
results = soup.find("span", {"data-reactid" : "121"})
print results.text

或者，您可以使用regex和findNext来获取值：

from bs4 import BeautifulSoup, Comment
import urllib


riskURL = "https://finance.yahoo.com/quote/SHSAX/risk"
page = urllib.request.urlopen(riskURL)
content = page.read().decode('utf-8')
soup = BeautifulSoup(content, 'html.parser')
for span in soup.find_all('span',text=re.compile('^(Standard Deviation)')):
    print span.findNext('span').text

从我在web上读到的内容来看，“data reactid”是react框架用于引用组件的自定义属性（您可以在这里阅读更多内容），经过几次尝试后，我注意到在页面的每次重新加载上，data reactid属性都是不同的，就像随机生成的一样

我认为你应该尝试另一种方法来实现这一点

也许您可以尝试找到一个特定的元素，如“标准偏差”行，然后向下循环以收集数据

std_span = next(x for x in soup.find_all('span') if x.text == "Standard Deviation")
parent_div = std_span.parent
for sibling in parent_div.next_siblings:
   for child in sibling.children:
      # do something
      print(child.text)

希望有帮助。

性病的值应该是13.44而不是8.84。我没注意到！做了一个编辑，这会让你得到13.44 std dev你找到121了吗？我猜是124？

std_span = next(x for x in soup.find_all('span') if x.text == "Standard Deviation")
parent_div = std_span.parent
for sibling in parent_div.next_siblings:
   for child in sibling.children:
      # do something
      print(child.text)