Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将源于中html网页的表转换为dataframe_Python_Beautifulsoup - Fatal编程技术网

Python 将源于中html网页的表转换为dataframe

Python 将源于中html网页的表转换为dataframe,python,beautifulsoup,Python,Beautifulsoup,我正在尝试从网页中获取一个表,并将其转换为数据帧以用于分析。我使用了BeautifulSoup包来抓取url并解析表信息,但我似乎无法将信息导出到数据帧。我的代码如下: from bs4 import BeautifulSoup as bs from urllib import request source = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").re

我正在尝试从网页中获取一个表,并将其转换为数据帧以用于分析。我使用了BeautifulSoup包来抓取url并解析表信息,但我似乎无法将信息导出到数据帧。我的代码如下:

from bs4 import BeautifulSoup as bs
from urllib import request

source = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").read()
soup = bs(source, "html.parser")

table = soup.table

table_rows = table.find_all("tr")

for tr in table_rows:
    td = tr.find_all("td")
    row = [i.text for i in td]
    print(row)
通过这样做,我可以看到每一行,但我不知道如何将其转换为df。有什么想法吗?

请试试这个

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import pandas as pd

source = urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").read()
soup = bs(source, "html.parser")

table = soup.table

table_rows = table.find_all("tr")

postal_codes = []

for tr in table_rows:
    td = tr.find_all("td")
    row = [ i.text[:-1] for i in td]
    postal_codes.append(row)
    #print(row)

postal_codes.pop(0)

df = pd.DataFrame(postal_codes, columns=['PostalCode', 'Borough', 'Neighborhood'])

print(df)

u可以使用熊猫
read\u html

# read's all the tables & return as an array, pick the data table that meets your need

table_list = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

print(table_list[0])

  Postal Code           Borough               Neighborhood
0         M1A      Not assigned                        NaN
1         M2A      Not assigned                        NaN
2         M3A        North York                  Parkwoods
3         M4A        North York           Victoria Village