Python 如何刮取具有嵌套表的表?
我正试图刮表,如所附的图片所示 所需输出: 我试着用selenium和python的漂亮汤库来刮它。但是excel的输出都是乱七八糟的,尤其是嵌套的表格部分。我希望输出如上图所示。 下面是此表的HTML代码Python 如何刮取具有嵌套表的表?,python,pandas,selenium,beautifulsoup,html-table,Python,Pandas,Selenium,Beautifulsoup,Html Table,我正试图刮表,如所附的图片所示 所需输出: 我试着用selenium和python的漂亮汤库来刮它。但是excel的输出都是乱七八糟的,尤其是嵌套的表格部分。我希望输出如上图所示。 下面是此表的HTML代码 <table class="table collapse show" id="HTBXactiveShelfReg"> <thead> <tr> <th scope="co
<table class="table collapse show" id="HTBXactiveShelfReg">
<thead>
<tr>
<th scope="col">File Number</th>
<th scope="col">Date of Effect</th>
<th scope="col">Date of Expiration</th>
<th scope="col">I.B.6 Restricted</th>
<th scope="col">Offering Value</th>
<th scope="col">Offering Value Breakdown</th>
<th scope="col">Offering Type</th>
<th scope="col">Warrant Exercise Prices</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<a href="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&filenum=333-237808&owner=exclude&count=100" target="_blank" style="color: #3380FF">333-237808</a>
</td>
<td>05/04/2020</td>
<td>04/23/2023</td>
<td>No</td>
<td>$150,000,000.00</td>
<td>
<table class="table" id="primary_breakdown_424">
<thead>
<tr>
<th scope="col">Source</th>
<th scope="col">Date</th>
<th scope="col">Value Used</th>
<th scope="col">Value Remaining</th>
<th scope="col">Underwriter</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335020000657/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
<td>07/27/2020</td>
<td>$100,000,000.00</td>
<td>$50,000,000.00</td>
<td>B. Riley</td>
</tr>
</tbody>
</table>
</td>
<td>AtTheMarket</td>
<td>
None
</td>
</tr>
<tr>
<td>
<a href="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&filenum=333-221201&owner=exclude&count=100" target="_blank" style="color: #3380FF">333-221201</a>
</td>
<td>11/13/2017</td>
<td>10/30/2020</td>
<td>Yes</td>
<td>$50,000,000.00</td>
<td>
<table class="table" id="primary_breakdown_424">
<thead>
<tr>
<th scope="col">Source</th>
<th scope="col">Date</th>
<th scope="col">Value Used</th>
<th scope="col">Value Remaining</th>
<th scope="col">Underwriter</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335018000058/htbx_424b5.htm" target="_blank" style="color: #3380FF">424B5</a></td>
<td>01/19/2018</td>
<td>$3,658,000.00</td>
<td>$46,342,000.00</td>
<td>H.C. Wainwright</td>
</tr>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335018000223/htbx_424b5.htm" target="_blank" style="color: #3380FF">424B5</a></td>
<td>03/16/2018</td>
<td>$1,300,000.00</td>
<td>$45,042,000.00</td>
<td>H.C. Wainwright</td>
</tr>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335018001287/htbx_424b.htm" target="_blank" style="color: #3380FF">424B2</a></td>
<td>11/21/2018</td>
<td>$12,000,000.00</td>
<td>$33,042,000.00</td>
<td>A.G.P.</td>
</tr>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335019000327/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
<td>04/04/2019</td>
<td>$18,000,000.00</td>
<td>$15,042,000.00</td>
<td>B. Riley</td>
</tr>
</tbody>
</table>
</td>
<td>AtTheMarket</td>
<td>
<table class="table" id="primarywarrants">
<thead>
<tr>
<th scope="col">Source</th>
<th scope="col">Date</th>
<th scope="col">Price</th>
<th scope="col"># Warrants Offered</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335019000327/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
<td>04/04/2019</td>
<td>$0.48</td>
<td>32,610</td>
</tr>
<tr>
<td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335019000327/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
<td>04/04/2019</td>
<td>$1.00</td>
<td>6,825,000</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
我建议使用Pandas从web上抓取表(或任何其他类似数据帧的内容)。此问题已在第页中解决。你可能想去看电影 这是复制品的链接
干杯 我建议使用Pandas从web上删除表(或任何其他类似数据帧的内容)。此问题已在第页中解决。你可能想去看电影 这是复制品的链接
干杯 你介意分享你迄今为止尝试过的代码吗?@PrakharJhudele我已经分享过了。请检查。下次你提问时,最好包含一些特定的语言标记,这样更多的人可以帮助你。介意分享你迄今为止尝试过的代码吗?@PrakharJhudele我已经分享过了。请检查。下次您提问时,最好包含一些特定的语言标记,以便更多的人可以帮助您。我仍然无法在excel中获得所需的输出,如上图所示。您可以共享该表的链接吗?我可以试一试。我想向前有两种方法。一种是使用多索引,这将导致相当复杂的表结构。另一个是把桌子弄平。也就是说,摆脱“提供价值细分”,将子列变成完全独立的列。我仍然无法在excel中获得所需的输出,如上图所示。您能否共享该表的链接?我可以试一试。我想向前有两种方法。一种是使用多索引,这将导致相当复杂的表结构。另一个是把桌子弄平。也就是说,去掉“提供价值细分”,将子列变成完全独立的列。
output = []
table = driver.find_element_by_id('HTBXactiveShelfReg')
output.append([i.text for i in table.find_elements_by_tag_name('th')])
rows = table.find_elements_by_tag_name('tr')
for row in rows:
output.append(['{}'.format(x.text) for x in row.find_elements_by_tag_name('td')])
file = os.path.join('htb.csv')
outfile = open(file, "a")
for row in output:
outfile.write('"' + '","'.join(row) + '"\n')
outfile.close()