Python 美丽的汤刮表与表休息_Python_Dataframe_Web Scraping_Beautifulsoup

Python 美丽的汤刮表与表休息

python dataframe web-scraping

Python 美丽的汤刮表与表休息,python,dataframe,web-scraping,beautifulsoup,Python,Dataframe,Web Scraping,Beautifulsoup,我正试着把一个数据框拼凑成一个数据框。我的尝试只返回表名，而不返回每个区域行中的数据这就是我到目前为止所做的： from bs4 import BeautifulSoup as bs4 import requests url = 'https://www.eia.gov/todayinenergy/prices.php' r = requests.get(url) soup = bs4(r.text, "html.parser") table_regions = soup.find('ta

我正试着把一个数据框拼凑成一个数据框。我的尝试只返回表名，而不返回每个区域行中的数据

这就是我到目前为止所做的：

from bs4 import BeautifulSoup as bs4
import requests

url = 'https://www.eia.gov/todayinenergy/prices.php'
r = requests.get(url)
soup = bs4(r.text, "html.parser")

table_regions = soup.find('table', {'class': "t4"})
regions = table_regions.find_all('tr')

for row in regions:
    print row

我希望得到的理想结果是：

region         | price   
---------------|-------
new england    | 2.59
new york city  | 2.52

感谢您的帮助。

如果您检查html响应（soup），您将看到在这一行中得到的表标记

table_regions=soup.find（'table'，{'class'：“t4}）

它在包含所需信息的行（包含类名为up dn d1和s1的td的行）之前关闭。那么，像这样使用原始td标记如何：

from bs4 import BeautifulSoup as bs4
import requests
import pandas as pd

url = 'https://www.eia.gov/todayinenergy/prices.php'
r = requests.get(url)
soup = bs4(r.text, "html.parser")

a = soup.find_all('tr')
rows = []
subel = []

for tr in a[42:50]:
    b = tr.find_all('td')
    for td in b:
        subel.append(td.string)
    rows.append(subel)
    subel = []

df = pd.DataFrame(rows, columns=['Region','Price_1', 'Percent_change_1', 'Price_2', 'Percent_change_2', 'Spark Spread'])

请注意，我只使用了结果的

a[42:50]

部分，因为a包含网站的所有td。如果需要，您也可以使用其余部分。

非常感谢您的指导。对行索引做了一个小调整，以捕获所有记录

a[40:50]

：）