Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/git/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于html标记获取表的内容_Python_Beautifulsoup - Fatal编程技术网

Python 基于html标记获取表的内容

Python 基于html标记获取表的内容,python,beautifulsoup,Python,Beautifulsoup,我有下表: <table id="sample"> <tbody> <tr class="toprow"> <td style="width:25%"></td> <td style="width:25%">Number of Jurisdictions</td>

我有下表:

<table id="sample">
    <tbody>
        <tr class="toprow">
            <td style="width:25%"></td>
            <td style="width:25%">Number of Jurisdictions</td>
            <td style="width:25%">Per cent of total</td>
        </tr>
        <tr>
            <td class="leftcol">Europe</td>
            <td class="data">44</td>
            <td class="data">29%</td>
        </tr>
 </tbody>
</table>
我能够得到标题:

['', 'Number of Jurisdictions', 'Per cent of total']

现在我想获取单元格的内容,但我不知道如何循环使用
标记,因为它的类可能会更改为“leftcol”或“data”

如果我理解正确,我会简化一下:

gdp = soup.select("table#sample")[0]
rows = []
cols = []
for g in gdp.select('tr.toprow'):
    for c in g.select('td'):
        cols.append(c.text)
    
for g in gdp.select('tr:not(.toprow)'):
    row = []
    for item in g.select('td'):
        row.append(item.text)
    rows.append(row)
pd.DataFrame(rows, columns=cols)
或者,您可以通过使用列表理解来进一步简化它(我认为,这是以降低可读性为代价的):

cols = [c.text for g in gdp.select('tr.toprow') for c in g.select('td')]
rows = [[item.text for item in g.select('td')] for g in gdp.select('tr:not(.toprow)')]
pd.DataFrame(rows, columns=cols)
输出:

                        Number of Jurisdictions     Per cent of total
0   Europe              44                          29%
1   Africa              23                          15%
2   Middle East         13                           9%
3   Asia and Oceania    33                          22%
4   Americas            37                          25%
5   Totals             150                          100%
                        Number of Jurisdictions     Per cent of total
0   Europe              44                          29%
1   Africa              23                          15%
2   Middle East         13                           9%
3   Asia and Oceania    33                          22%
4   Americas            37                          25%
5   Totals             150                          100%