Python 使用BeautifulSoup搜索雅虎财经_Python_Beautifulsoup_Yahoo Finance

Python 使用BeautifulSoup搜索雅虎财经

python

Python 使用BeautifulSoup搜索雅虎财经,python,beautifulsoup,yahoo-finance,Python,Beautifulsoup,Yahoo Finance,我正试图从雅虎股票代码的“关键统计”页面中获取信息（因为熊猫图书馆不支持这一点） AAPL示例： from bs4 import BeautifulSoup import requests url = 'http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL' page = requests.get(url) soup = BeautifulSoup(page.text, 'lxml') enterpriseValue = soup

我正试图从雅虎股票代码的“关键统计”页面中获取信息（因为熊猫图书馆不支持这一点）

AAPL示例：

from bs4 import BeautifulSoup
import requests

url = 'http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

enterpriseValue = soup.findAll('$ENTERPRISE_VALUE', attrs={'class': 'yfnc_tablehead1'}) #HTML tag for where enterprise value is located

print(enterpriseValue)

编辑：谢谢安迪

问题：这是打印空数组。如何更改我的

findAll

以返回

598.56B

？

嗯，

find\u all

返回的列表为空的原因是，数据是通过单独调用生成的，而不是通过向该URL发送

GET

请求来完成。如果您查看Chrome/Firefox上的网络选项卡，并通过XHR进行过滤，通过检查每个网络操作的请求和响应，您可以找到您应该发送的

GET

请求的URL

在这种情况下，它是

https://query2.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&crumb=8ldhetOu7RJ&lang=en-US®ion=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com

，如下所示：

那么，我们如何重现这一点呢？简单！：

from bs4 import BeautifulSoup
import requests

r = requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&crumb=8ldhetOu7RJ&lang=en-US&region=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com')
data = r.json()

这将返回

JSON

响应作为

dict

。从那里，浏览

目录

，直到找到要查找的数据：

financial_data = data['quoteSummary']['result'][0]['defaultKeyStatistics']
enterprise_value_dict = financial_data['enterpriseValue']
print(enterprise_value_dict)
>>> {'fmt': '598.56B', 'raw': 598563094528, 'longFmt': '598,563,094,528'}
print(enterprise_value_dict['fmt'])
>>> '598.56B'

这是金子！一般来说，我不熟悉网页报废。有没有什么资源可以让我在不久的将来避免类似的问题？检查一下，如果你真的想深入潜水，考虑一下。这是一项很好的技能。