如何使用BeautifulSoup从python网站中卸载的选项卡中刮取表数据
我正试图从中获取索引的数据。我试图从“索引”选项卡中刮取滚动数据,但刮取表时,其内容显示如下:如何使用BeautifulSoup从python网站中卸载的选项卡中刮取表数据,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,我正试图从中获取索引的数据。我试图从“索引”选项卡中刮取滚动数据,但刮取表时,其内容显示如下: <table cellspacing="0" class="derivatives_section table table-striped responsive dt-responsive nowrap derivatives_rollover_tbl" id="rollover_index_table" width="100
<table cellspacing="0" class="derivatives_section table table-striped responsive dt-responsive nowrap derivatives_rollover_tbl" id="rollover_index_table" width="100%">
<thead>
<tr>
<th>Index</th>
<th>Future<br/> Price</th>
<th>% Price<br/> Chg.</th>
<th>% OI<br/> Chg.</th>
<th>No. of Shares<br/> Rolled</th>
<th>% Rollover</th>
<th id="ro_idx_1">% Chg Rollover <br/> Vs. 1 Month Avg.</th>
<th>% Rollover <br/>Cost </th>
<th id="ro_idx_2">% Chg Rollover Cost <br/> Vs. 1 Month Avg.</th>
</tr>
</thead>
<tbody>
<tr>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
</tr>
<tr>
如何刮取网站的索引选项卡数据?数据来自返回json的API调用。您可以按如下方式创建数据的dataframe:
import requests
import pandas as pd
r = requests.get('https://www.indiainfoline.com/api/papi-call-api.php?url=/Derivative/Derivative.svc/FNO-Rollover/FUTSTK/?responsetype=json').json()
df = pd.DataFrame(r['response']['data']['FNORollOverList']['FNORollOverdata'])
print(df)
只是解释一下@QHarr做了什么。此网站的内容是动态生成的。这意味着内容通过JavaScript通过这个Json文件呈现。您可以看到,当您使用Bs4发出请求时,数据未加载,这就是您无法检索数据的原因
<div class="bs-component deri_roll_main">
<div class="row">
<div class="col-sm-6 col-xs-12">
<ul class="nav nav-tabs mb0">
<li id="stk_tab" class="active"><a href="#derivatives_stock" data-toggle="tab">Stock</a></li>
<li id="idx_tab"><a href="#derivatives_index" data-toggle="tab">Index</a></li>
</ul>
</div>
<div class="clearfix hidden visible-xs gray_bdr_b"></div>
<div class="col-sm-6 col-xs-12 txt_left_m text-right">
<div class="fill_exp_date w100p"><span>Expiry Date -</span> </div>
</div>
</div>
<div id="myTabContent" class="tab-content">
<div class="tab-pane fade active in" id="derivatives_stock">
<!-- <table id="derivatives_rollover_tbl" class="derivatives_rollover_tbl display nowrap" style="width:100%">-->
<div class="tablepanel">
<table class="derivatives_section table table-striped responsive dt-responsive nowrap derivatives_rollover_tbl" cellspacing="0" width="100%" id="rollover_stock_table">
<thead>
<tr>
<th>Script</th>
<th >Future<br> Price</th>
<th>% Price<br> Chg</th>
<th>% OI<br> Chg</th>
<th>No. of Shares<br> Rolled</th>
<th>% Rollover</th>
<th id="ro_stk_1">% Chg Rollover <br> VS 1 Month.Avg</th>
<th>RO<br>Cost </th>
<th id="ro_stk_2">% Chg Rollover <br> VS 1 Month.Avg</th>
</tr>
</thead>
<tbody>
<tr>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
</tr>
届满日期-
剧本
未来
价格
%价格
Chg
%OI
Chg
股份数量
滚动
%翻滚
%Chg展期
与1个月平均值相比
RO
成本
%Chg展期
与1个月平均值相比
解决这个问题的一种方法是直接从API调用中获取数据,这是本场景中最好的方法。但这并不总是可能的。第二种方法是使用另一种支持Javascript并将为您呈现此数据的工具,如Selenium或Scrapy with Splash。谢谢您的回答,非常欢迎您
<div class="bs-component deri_roll_main">
<div class="row">
<div class="col-sm-6 col-xs-12">
<ul class="nav nav-tabs mb0">
<li id="stk_tab" class="active"><a href="#derivatives_stock" data-toggle="tab">Stock</a></li>
<li id="idx_tab"><a href="#derivatives_index" data-toggle="tab">Index</a></li>
</ul>
</div>
<div class="clearfix hidden visible-xs gray_bdr_b"></div>
<div class="col-sm-6 col-xs-12 txt_left_m text-right">
<div class="fill_exp_date w100p"><span>Expiry Date -</span> </div>
</div>
</div>
<div id="myTabContent" class="tab-content">
<div class="tab-pane fade active in" id="derivatives_stock">
<!-- <table id="derivatives_rollover_tbl" class="derivatives_rollover_tbl display nowrap" style="width:100%">-->
<div class="tablepanel">
<table class="derivatives_section table table-striped responsive dt-responsive nowrap derivatives_rollover_tbl" cellspacing="0" width="100%" id="rollover_stock_table">
<thead>
<tr>
<th>Script</th>
<th >Future<br> Price</th>
<th>% Price<br> Chg</th>
<th>% OI<br> Chg</th>
<th>No. of Shares<br> Rolled</th>
<th>% Rollover</th>
<th id="ro_stk_1">% Chg Rollover <br> VS 1 Month.Avg</th>
<th>RO<br>Cost </th>
<th id="ro_stk_2">% Chg Rollover <br> VS 1 Month.Avg</th>
</tr>
</thead>
<tbody>
<tr>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
<td><div class="text-line loading"></div></td>
</tr>