Python 从静态网站中删除表
我需要刮表与顶级域名从 我的代码:Python 从静态网站中删除表,python,python-3.x,web-scraping,beautifulsoup,python-requests,Python,Python 3.x,Web Scraping,Beautifulsoup,Python Requests,我需要刮表与顶级域名从 我的代码: import requests from bs4 import BeautifulSoup URL = 'https://www.iana.org/domains/root/db' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') results = soup.find(id='tld-table') 我如何才能将其以网站上的结构(域、类型、TLD管理器
import requests
from bs4 import BeautifulSoup
URL = 'https://www.iana.org/domains/root/db'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='tld-table')
我如何才能将其以网站上的结构(域、类型、TLD管理器)发送到pandas DataFrame。pandas已经提供了一些可读取的表格,无需使用BeautifulSoup:
import pandas as pd
url = "https://www.iana.org/domains/root/db"
# This returns a list of DataFrames with all tables in the page.
df = pd.read_html(url)[0]
熊猫已经提供了一些阅读表格的功能,无需使用BeautifulSoup:
import pandas as pd
url = "https://www.iana.org/domains/root/db"
# This returns a list of DataFrames with all tables in the page.
df = pd.read_html(url)[0]
您可以使用pandas
pd.read\u html
import pandas as pd
URL = "https://www.iana.org/domains/root/db"
df = pd.read_html(URL)[0]
print(df.head())
Domain Type TLD Manager
0 .aaa generic American Automobile Association, Inc.
1 .aarp generic AARP
2 .abarth generic Fiat Chrysler Automobiles N.V.
3 .abb generic ABB Ltd
4 .abbott generic Abbott Laboratories, Inc.
您可以使用pandas
pd.read\u html
import pandas as pd
URL = "https://www.iana.org/domains/root/db"
df = pd.read_html(URL)[0]
print(df.head())
Domain Type TLD Manager
0 .aaa generic American Automobile Association, Inc.
1 .aarp generic AARP
2 .abarth generic Fiat Chrysler Automobiles N.V.
3 .abb generic ABB Ltd
4 .abbott generic Abbott Laboratories, Inc.