Python 使用beautifulSoup错误提取HTML_Python_Web Scraping_Beautifulsoup

Python 使用beautifulSoup错误提取HTML

python web-scraping

Python 使用beautifulSoup错误提取HTML,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,这是我第一次在左上角提取的代码 import qgrid import webbrowser import requests from bs4 import BeautifulSoup page = requests.get('http://www.meteo.gr/cf.cfm?city_id=14') #sending the request to take the html file. soup = BeautifulSoup(page.content, 'html.parser') #

这是我第一次在左上角提取的代码

import qgrid
import webbrowser
import requests
from bs4 import BeautifulSoup

page = requests.get('http://www.meteo.gr/cf.cfm?city_id=14') #sending the request to take the html file.
soup = BeautifulSoup(page.content, 'html.parser') #creating beautifulSoup object of the html code.

four_days = soup.find(id="prognoseis")#PINPOINTING to the section that i want to focus (the outer).

#Selecting specific elements , having as my base the seven_day.
periods = [p.get_text() for p in four_days.select(".perhour-rowmargin .innerTableCell-fulltime")]


#creating a Data Frame via pandas to print it TABLE-like.
import pandas as pd
weather = pd.DataFrame({"period ": periods})
print weather

我查阅了一本很好的教程，开始了解它的窍门。在four_days对象中，我持有“Prognosis”中包含的html代码部分，这是我想要的信息所在。在periods对象之后，我选择包含所需信息的元素，并作为第二个参数指定要提取的ExExExtly文本

代码运行并给我空的

您正在类名之间添加破折号，但不存在此类破折号。您选择的

元素有两个类，

每小时

和

行边距

，但您选择的是不存在的类

每小时行边距

。这同样适用于

td

元素；它们有单独的类

fulltime

和

innerTableCell

只需选择一个或另一个；下面返回所需的单元格：

four_days.select(".perhour .fulltime")

您可能还希望删除每个单元格数据周围的额外换行符；将

strip=True

添加到

get_text（）

调用：

[p.get_text(strip=True) for p in four_days.select(".perhour .fulltime")]

元素有两个类，

perhour

和

rowmargin

，没有一个类名为

perhour rowmargin

lol，我不知道，谢谢@MartijnPieters解决了这个问题。正如martin所说，这是答案，我接受它。我只是不知道注释是如何工作的。谢谢，你是救命恩人