用Python从Wikipedia中删除表格

用Python从Wikipedia中删除表格,python,numpy,beautifulsoup,html-table,wikipedia-api,Python,Numpy,Beautifulsoup,Html Table,Wikipedia Api,我正试图用Python和BeautifulSoup清理维基百科表。当我尝试使用for循环获取表列属性时,会出现以下错误: NameError Traceback (most recent call last) <ipython-input-18-948408e65d8d> in <module> 1 # Header attributes of the table 2 header=[

我正试图用Python和BeautifulSoup清理维基百科表。当我尝试使用for循环获取表列属性时,会出现以下错误:

NameError                                 Traceback (most recent call last)
<ipython-input-18-948408e65d8d> in <module>
      1 # Header attributes of the table
      2 header=[th.text.rstrip() 
----> 3         for th in rows[0].find_all('th')]
      4 print(header)
      5 print('------------')

NameError: name 'rows' is not defined
维基百科页面标题 找一张合适的桌子来擦 表的标题属性
您可以使用
pandas
,这在您的情况下非常简单:

import pandas as pd
tables  = pd.read_html("https://en.wikipedia.org/wiki/List_of_municipalities_of_Norway")
right_table = tables[1]
输出

|     |   Number[1](ISO 3166-2:NO) | Name                         | Adm. center          | County               |   Population(2017)[2] |   Area(km²)[3] |   CountyMap |   Arms | Language form[4]         | Mayor[5]                    | Party   |
|----:|---------------------------:|:-----------------------------|:---------------------|:---------------------|----------------------:|---------------:|------------:|-------:|:-------------------------|:----------------------------|:--------|
|   0 |                        301 | Oslo                         | Oslo                 | Oslo                 |                673469 |         454.03 |         nan |    nan | Neutral                  | Marianne Borgen             | SV      |
|   1 |                       1101 | Eigersund                    | Egersund             | Rogaland             |                 14898 |         431.66 |         nan |    nan | Bokmål                   | Leif Erik Egaas             | H       |
|   2 |                       1103 | Stavanger                    | Stavanger            | Rogaland             |                141186 |         262.52 |         nan |    nan | Bokmål                   | Kari Nessa Nordtun          | Ap      |
|   3 |                       1106 | Haugesund                    | Haugesund            | Rogaland             |    

这些行来自哪里?来自右侧的_table=soup.find('table',{“class”:'sortable wikitable'})。这将从Wikipedia网页中查找带有HTML标记“sortable wikitable”的表。在find_all函数中调用rows=[]之前是否应该设置它?是的。只有在初始化列表时,才能向其追加值。
right_table=soup.find('table',{"class":'sortable wikitable'})
header=[th.text.rstrip() 
        for th in rows[0].find_all('th')]
print(header)
print('------------')
print(len(header))
import pandas as pd
tables  = pd.read_html("https://en.wikipedia.org/wiki/List_of_municipalities_of_Norway")
right_table = tables[1]
|     |   Number[1](ISO 3166-2:NO) | Name                         | Adm. center          | County               |   Population(2017)[2] |   Area(km²)[3] |   CountyMap |   Arms | Language form[4]         | Mayor[5]                    | Party   |
|----:|---------------------------:|:-----------------------------|:---------------------|:---------------------|----------------------:|---------------:|------------:|-------:|:-------------------------|:----------------------------|:--------|
|   0 |                        301 | Oslo                         | Oslo                 | Oslo                 |                673469 |         454.03 |         nan |    nan | Neutral                  | Marianne Borgen             | SV      |
|   1 |                       1101 | Eigersund                    | Egersund             | Rogaland             |                 14898 |         431.66 |         nan |    nan | Bokmål                   | Leif Erik Egaas             | H       |
|   2 |                       1103 | Stavanger                    | Stavanger            | Rogaland             |                141186 |         262.52 |         nan |    nan | Bokmål                   | Kari Nessa Nordtun          | Ap      |
|   3 |                       1106 | Haugesund                    | Haugesund            | Rogaland             |