Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何刮取具有嵌套表的表?_Python_Pandas_Selenium_Beautifulsoup_Html Table - Fatal编程技术网

Python 如何刮取具有嵌套表的表?

Python 如何刮取具有嵌套表的表?,python,pandas,selenium,beautifulsoup,html-table,Python,Pandas,Selenium,Beautifulsoup,Html Table,我正试图刮表,如所附的图片所示 所需输出: 我试着用selenium和python的漂亮汤库来刮它。但是excel的输出都是乱七八糟的,尤其是嵌套的表格部分。我希望输出如上图所示。 下面是此表的HTML代码 <table class="table collapse show" id="HTBXactiveShelfReg"> <thead> <tr> <th scope="co

我正试图刮表,如所附的图片所示

所需输出:

我试着用selenium和python的漂亮汤库来刮它。但是excel的输出都是乱七八糟的,尤其是嵌套的表格部分。我希望输出如上图所示。 下面是此表的HTML代码

<table class="table collapse show" id="HTBXactiveShelfReg">
  <thead>
    <tr>
      <th scope="col">File Number</th>
      <th scope="col">Date of Effect</th>
      <th scope="col">Date of Expiration</th>
      <th scope="col">I.B.6 Restricted</th>
      <th scope="col">Offering Value</th>
      <th scope="col">Offering Value Breakdown</th>
      <th scope="col">Offering Type</th>
      <th scope="col">Warrant Exercise Prices</th>
    </tr>
  </thead>
  <tbody>

    
    <tr>
      <td>
        <a href="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-237808&amp;owner=exclude&amp;count=100" target="_blank" style="color: #3380FF">333-237808</a>
      </td>
      <td>05/04/2020</td>
      <td>04/23/2023</td>
      <td>No</td>
      <td>$150,000,000.00</td>
      <td>
        
        <table class="table" id="primary_breakdown_424">
          <thead>
            <tr>
              <th scope="col">Source</th>
              <th scope="col">Date</th>
              <th scope="col">Value Used</th>
              <th scope="col">Value Remaining</th>
              <th scope="col">Underwriter</th>
            </tr>
          </thead>
          <tbody>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335020000657/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
              <td>07/27/2020</td>
              <td>$100,000,000.00</td>
              <td>$50,000,000.00</td>
              <td>B.&nbsp;Riley</td>
            </tr>
          
          </tbody>
        </table>
        
      </td>
      <td>AtTheMarket</td>
      <td>
        
        None
        
      </td>
    </tr>
    
    <tr>
      <td>
        <a href="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-221201&amp;owner=exclude&amp;count=100" target="_blank" style="color: #3380FF">333-221201</a>
      </td>
      <td>11/13/2017</td>
      <td>10/30/2020</td>
      <td>Yes</td>
      <td>$50,000,000.00</td>
      <td>
        
        <table class="table" id="primary_breakdown_424">
          <thead>
            <tr>
              <th scope="col">Source</th>
              <th scope="col">Date</th>
              <th scope="col">Value Used</th>
              <th scope="col">Value Remaining</th>
              <th scope="col">Underwriter</th>
            </tr>
          </thead>
          <tbody>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335018000058/htbx_424b5.htm" target="_blank" style="color: #3380FF">424B5</a></td>
              <td>01/19/2018</td>
              <td>$3,658,000.00</td>
              <td>$46,342,000.00</td>
              <td>H.C. Wainwright</td>
            </tr>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335018000223/htbx_424b5.htm" target="_blank" style="color: #3380FF">424B5</a></td>
              <td>03/16/2018</td>
              <td>$1,300,000.00</td>
              <td>$45,042,000.00</td>
              <td>H.C. Wainwright</td>
            </tr>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335018001287/htbx_424b.htm" target="_blank" style="color: #3380FF">424B2</a></td>
              <td>11/21/2018</td>
              <td>$12,000,000.00</td>
              <td>$33,042,000.00</td>
              <td>A.G.P.</td>
            </tr>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335019000327/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
              <td>04/04/2019</td>
              <td>$18,000,000.00</td>
              <td>$15,042,000.00</td>
              <td>B. Riley</td>
            </tr>
          
          </tbody>
        </table>
        
      </td>
      <td>AtTheMarket</td>
      <td>
        
        <table class="table" id="primarywarrants">
          <thead>
            <tr>
              <th scope="col">Source</th>
              <th scope="col">Date</th>
              <th scope="col">Price</th>
              <th scope="col"># Warrants Offered</th>
            </tr>
          </thead>
          <tbody>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335019000327/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
              <td>04/04/2019</td>
              <td>$0.48</td>
              <td>32,610</td>
            </tr>
          
            <tr>
              <td><a href="https://www.sec.gov/Archives/edgar/data/1476963/000155335019000327/htbx_424b.htm" target="_blank" style="color: #3380FF">424B5</a></td>
              <td>04/04/2019</td>
              <td>$1.00</td>
              <td>6,825,000</td>
            </tr>
          
          </tbody>
        </table>
        
      </td>
    </tr>
    

  </tbody>
</table>

我建议使用Pandas从web上抓取表(或任何其他类似数据帧的内容)。此问题已在第页中解决。你可能想去看电影

这是复制品的链接


干杯

我建议使用Pandas从web上删除表(或任何其他类似数据帧的内容)。此问题已在第页中解决。你可能想去看电影

这是复制品的链接


干杯

你介意分享你迄今为止尝试过的代码吗?@PrakharJhudele我已经分享过了。请检查。下次你提问时,最好包含一些特定的语言标记,这样更多的人可以帮助你。介意分享你迄今为止尝试过的代码吗?@PrakharJhudele我已经分享过了。请检查。下次您提问时,最好包含一些特定的语言标记,以便更多的人可以帮助您。我仍然无法在excel中获得所需的输出,如上图所示。您可以共享该表的链接吗?我可以试一试。我想向前有两种方法。一种是使用多索引,这将导致相当复杂的表结构。另一个是把桌子弄平。也就是说,摆脱“提供价值细分”,将子列变成完全独立的列。我仍然无法在excel中获得所需的输出,如上图所示。您能否共享该表的链接?我可以试一试。我想向前有两种方法。一种是使用多索引,这将导致相当复杂的表结构。另一个是把桌子弄平。也就是说,去掉“提供价值细分”,将子列变成完全独立的列。
output = []

table = driver.find_element_by_id('HTBXactiveShelfReg')
output.append([i.text for i in table.find_elements_by_tag_name('th')])
rows = table.find_elements_by_tag_name('tr')
for row in rows:
    output.append(['{}'.format(x.text) for x in row.find_elements_by_tag_name('td')])
file = os.path.join('htb.csv')
outfile = open(file, "a")
for row in output:
    outfile.write('"' + '","'.join(row) + '"\n')
outfile.close()