Python请求HTML-Can';t从表中提取特定数据
我想从此站点提取数据: 但是,我没有得到任何结果。我发现每一行的开头都是这样的:Python请求HTML-Can';t从表中提取特定数据,python,beautifulsoup,Python,Beautifulsoup,我想从此站点提取数据: 但是,我没有得到任何结果。我发现每一行的开头都是这样的: <tr ng-if="portSpecific.data.distributionHistory.domicile !== 'GB'" data-ng-repeat="fundDistribution in distributionHistoryList | limitTo:10 " data-ng-include="'${app-content-cont
<tr ng-if="portSpecific.data.distributionHistory.domicile !== 'GB'" data-ng-repeat="fundDistribution in distributionHistoryList | limitTo:10 " data-ng-include="'${app-content-context}partials/includes/detail/distribution-rows.html' | configReplace | vuiCacheBuster" class="" style=""> <td class="vuiFixedCol fundDistributionType">Income Distribution</td>
<td class="alignRgt mostRecent"><span data-ng-bind-html="fundDistribution.mostRecent.currencySymbol">$</span>0.250768
</td>
<!----><td class="exDividendDate" data-ng-if="fund.data.assetClass !== 'Money Market'">24 Sep 2020</td><!---->
<td class="recordDate">25 Sep 2020</td>
<td class="payableDate">07 Oct 2020</td></tr>
网站是动态加载的,因此
请求
不支持它
但是,我们可以通过向发送get
请求来获取数据
谢谢你的明确回答!很抱歉,我有两个问题:您在哪里找到Vanguard SP500 JSON API的?第二,你能告诉我这个部分是怎么运作的吗?loads(re.search(r“({.*})”,str(soup)).group(1))提前感谢。@joey schuitemaker 1。在浏览器中,打开DevTools。(在chrome中)右键单击->检查->网络。在这里你可以看到所有的请求。2.我们需要的数据在大括号内,因此为了找到这些数据,我们使用
re.search(r)({.*})
。
import requests
from bs4 import BeautifulSoup
url = 'https://www.vanguardinvestor.co.uk/investments/vanguard-s-and-p-500-ucits-etf-usd-distributing/distributions'
data = requests.get(url)
soup = BeautifulSoup(data.text, 'html.parser')
data = []
for tr in soup.find_all('tr'):
values = [td.text for td in tr.find_all('td')]
print(values)
print(data)
import re
import json
import requests
from bs4 import BeautifulSoup
URL = "https://api.vanguard.com/rs/gre/gra/1.7.0/datasets/urd-product-port-specific.jsonp?vars=portId:9503,issueType:F&callback=angular.callbacks._4"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
fmt_string = "{:<25} {:<20} {:<20} {:<20} {:<20}"
print(
fmt_string.format(
"Distribution type",
"Most recent",
"Ex-dividend date",
"Record date",
"Payable data",
)
)
print("-" * 105)
json_data = json.loads(re.search(r"({.*})", str(soup)).group(1))
for data in json_data["distributionHistory"]["fundDistributionList"]:
distribution = data["type"]
most_recent = data["mostRecent"]["value"]
dividend_data = data["exDividendDate"]
record_data = data["recordDate"]
payable_data = data["payableDate"]
print(
fmt_string.format(
distribution, most_recent, dividend_data, record_data, payable_data
)
)
Distribution type Most recent Ex-dividend date Record date Payable data
---------------------------------------------------------------------------------------------------------
Income Distribution 0.250768 24 Sep 2020 25 Sep 2020 07 Oct 2020
Income Distribution 0.195290 11 Jun 2020 12 Jun 2020 24 Jun 2020
Income Distribution 0.289243 26 Mar 2020 27 Mar 2020 08 Apr 2020
Income Distribution 0.202612 12 Dec 2019 13 Dec 2019 27 Dec 2019
...And on