Parsing 从web解析后将列表转换为数据帧
我是python的新手。我想在CME网站下面重新创建表格,但是我无法将我创建的列表转换为数据框。非常感谢任何帮助!提前谢谢Parsing 从web解析后将列表转换为数据帧,parsing,dataframe,web-scraping,beautifulsoup,Parsing,Dataframe,Web Scraping,Beautifulsoup,我是python的新手。我想在CME网站下面重新创建表格,但是我无法将我创建的列表转换为数据框。非常感谢任何帮助!提前谢谢 url = "http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_product_calendar_futures.html" user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Age
url = "http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_product_calendar_futures.html"
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(req)
soup = BeautifulSoup(response)
header = soup.findAll('th',limit = 8)
column_header = []
for j in header:
column_header.append(j.getText())
data_rows = soup.findAll('tr')[2:]
dates = []
for i in range(len(data_rows)):
for td in data_rows[i].findAll('td'):
dates.append(td.getText())
输出:
.tbody
或.thead
缩小范围,不要使用限制
append
非常感谢!(Y) 我真的找到了一个办法。我不是用bs4,而是通过传递头来解析read_html,因为访问被禁止。
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_product_calendar_futures.html")
soup = BeautifulSoup(r.content, "lxml")
headers = [th.text for th in soup.thead.find_all('th')] # use thead to narrow the scope
print(headers)
for tr in soup.tbody.find_all('tr'):
row = [i.get_text(strip=True) for i in tr(['th', 'td'])]
print(row)
['Contract Month', 'Product Code', 'First TradeLast Trade', 'Settlement', 'First HoldingLast Holding', 'First PositionLast Position', 'First NoticeLast Notice', 'First DeliveryLast Delivery']
['Feb 2017', 'CLG17', '21 Nov 201120 Jan 2017', '20 Jan 2017', '--', '23 Jan 201723 Jan 2017', '24 Jan 201724 Jan 2017', '01 Feb 201728 Feb 2017']
['Mar 2017', 'CLH17', '21 Nov 201121 Feb 2017', '21 Feb 2017', '--', '22 Feb 201722 Feb 2017', '23 Feb 201723 Feb 2017', '01 Mar 201731 Mar 2017']
['Apr 2017', 'CLJ17', '21 Nov 201121 Mar 2017', '21 Mar 2017', '--', '22 Mar 201722 Mar 2017', '23 Mar 201723 Mar 2017', '01 Apr 201730 Apr 2017']
['May 2017', 'CLK17', '21 Nov 201120 Apr 2017', '20 Apr 2017', '--', '21 Apr 201721 Apr 2017', '24 Apr 201724 Apr 2017', '01 May 201731 May 2017']