Python 刮伤<；部门<；跨HTML页面_Python_Html_Web Scraping_Beautifulsoup

Python 刮伤<；部门<；跨HTML页面

python html web-scraping

Python 刮伤<；部门<；跨HTML页面,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我试图在Eclipse中用Python创建一个简单的天气预报。到目前为止，我写了以下内容： from bs4 import BeautifulSoup import requests def weather_forecast(): url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen' r = requests.get(url) # Get

我试图在Eclipse中用Python创建一个简单的天气预报。到目前为止，我写了以下内容：

from bs4 import BeautifulSoup
import requests


def weather_forecast():
    url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen'
    r = requests.get(url)  # Get request for contents of the page
    print(r.content)  # Outputs HTML code for the page
    soup = BeautifulSoup(r.content, 'html5lib')  # Parse the data with BeautifulSoup(HTML-string, html-parser)
    min_max = soup.select('min-max.temperature')  # Select all spans with a "min-max-temperature" attribute
    print(min_max.prettify())
    table = soup.find('div', attrs={'daily-weather-list-item__temperature'})
    print(table.prettify())

从包含如下元素的html页面：

我在HTML页面的元素中找到了第一个温度的路径，但是当我尝试执行代码并打印以查看是否正确执行时，没有打印任何内容。我的目标是打印一个包含日期和相应温度的表，这似乎是一项简单的任务，但我不知道如何正确命名属性，或者如何在一次迭代中从HTML页面中删除所有属性

你可以用字典来理解。循环遍历所有具有class

每日天气列表项的预测，然后从时间标签的日期时间
属性中提取日期，并将其用作键；将键与maxmin信息关联
import requests
from bs4 import BeautifulSoup

def weather_forecast():
    url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen'
    r = requests.get(url)  # Get request for contents of the page
    soup = BeautifulSoup(r.content, 'html5lib')  
    temps = {i.select_one('time')['datetime']:i.select_one('.min-max-temperature').get_text(strip= True) 
             for i in soup.select('.daily-weather-list-item')}
    return temps

weather_forecast()

如果代码附加了脚本标记，那么bs4有一半的时间不能工作。如果是这样的话，你就得用硒之类的东西。