Python从网站上刮表？_Python_Pandas_Selenium_Dataframe_Beautifulsoup

Python从网站上刮表？

python pandas selenium dataframe

Python从网站上刮表？,python,pandas,selenium,dataframe,beautifulsoup,Python,Pandas,Selenium,Dataframe,Beautifulsoup,我想从treasury.gov网站上获取每一个国债收益率我将如何着手获取这些信息？我假设我必须使用BeautifulSoup或Selenium或类似的东西（最好是BS4）。我最终想把这些数据放在一个数据框中这里有一种方法可以使用requests和beautifulsoup获取表中的数据 import pandas as pd import requests from bs4 import BeautifulSoup url = 'https://www.treasury.gov/reso

我想从treasury.gov网站上获取每一个国债收益率

我将如何着手获取这些信息？我假设我必须使用BeautifulSoup或Selenium或类似的东西（最好是BS4）。我最终想把这些数据放在一个数据框中

这里有一种方法可以使用requests和beautifulsoup获取表中的数据

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll'

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html)
table = soup.find('table', {"class": "t-chart"})
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

result = pd.DataFrame(data, columns=['Date', '1 Mo', '2 Mo', '3 Mo', '6 Mo', '1 Yr', '2 Yr', '3 Yr', '5 Yr', '7 Yr', '10 Yr', '20 Yr', '30 Yr'])

print(result)

缺少列

'2 Mo'

。只需将其添加到数组中即可避免错误：

ValueError:12列已传递，传递的数据有13列

。我试图编辑答案，但由于StackOverflow“队列已满”，因此无法编辑。