Python 3.x 无法使用BeautifulSoup对HTML表进行WebScrap，并使用Python将其加载到Pandas数据框中_Python 3.x_Pandas_Web Scraping_Beautifulsoup

Python 3.x 无法使用BeautifulSoup对HTML表进行WebScrap，并使用Python将其加载到Pandas数据框中

python-3.x pandas web-scraping

Python 3.x 无法使用BeautifulSoup对HTML表进行WebScrap，并使用Python将其加载到Pandas数据框中,python-3.x,pandas,web-scraping,beautifulsoup,Python 3.x,Pandas,Web Scraping,Beautifulsoup,我的目标是访问以下网页上的表格，并将其转换为一个包含“国家或地区”、“货币”和“ISO-4217”列的熊猫数据框架我能够正确地访问列，但我很难弄清楚如何将每一行附加到数据帧。你们对我如何做到这一点有什么建议吗？例如，在网页上，表格的第一行是字母“A”。但是，我需要数据帧中的第一行是afghani，afghani，和AFN 以下是我到目前为止的情况： from urllib.request import Request, urlopen from bs4 import BeautifulSoup

我的目标是访问以下网页上的表格，并将其转换为一个包含“国家或地区”、“货币”和“ISO-4217”列的熊猫数据框架

我能够正确地访问列，但我很难弄清楚如何将每一行附加到数据帧。你们对我如何做到这一点有什么建议吗？例如，在网页上，表格的第一行是字母“A”。但是，我需要数据帧中的第一行是

afghani

，

afghani

，和

AFN

以下是我到目前为止的情况：

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.countries-ofthe-world.com/world-currencies.html"
req = Request(url, headers={"User-Agent":"Mozilla/5.0"})
webpage=urlopen(req).read()
soup = BeautifulSoup(webpage, "html.parser")
table = soup.find("table", {"class":"codes"})
rows = table.find_all('tr')
columns = [v.text for v in rows[0].find_all('th')] 
print(columns) # ['Country or territory', 'Currency', 'ISO-4217']

请看这张图片

谢谢大家抽出时间

Tony

有了您的修复程序，它可以很容易地被pd解析

url = "https://www.countries-ofthe-world.com/world-currencies.html"
req = Request(url, headers={"User-Agent":"Mozilla/5.0"})
webpage = urlopen(req).read()

df = pd.read_html(webpage)[0]
print(df.head())

         Country or territory        Currency ISO-4217
0                           A               A        A
1                 Afghanistan  Afghan afghani      AFN
2  Akrotiri and Dhekelia (UK)   European euro      EUR
3     Aland Islands (Finland)   European euro      EUR
4                     Albania    Albanian lek      ALL

它有那些字母表标题，但是你可以去掉那些类似于

df=df[df['Currency']！=df['ISO-4217']]

的标题。你可能想检查

响应。状态代码

-我从那个网站得到一个403禁止，所以

响应。文本

中没有任何有用的内容。谢谢！我会调查的。我现在明白了。我解决了这个错误的请求。请查看更新的问题。