Python/bs4:尝试从本地网站打印温度/城市

Python/bs4:尝试从本地网站打印温度/城市,python,parsing,bs4,Python,Parsing,Bs4,我正试图从当地网站上获取并打印当前的天气温度和城市名称,但没有成功。 我需要它来阅读和打印城市洛德里纳,温度23.1摄氏度,如果可能的话,标题是《冷杉温度》和《德克利尼奥》——最后一个随着温度的上升或下降而变化 这是网站的html部分: THIS IS THE HTML (the part of matters:) #<div class="ca-cidade"><a href="/site/internas/conteudo/meteorologia/grafico.shtm

我正试图从当地网站上获取并打印当前的天气温度和城市名称,但没有成功。 我需要它来阅读和打印城市洛德里纳,温度23.1摄氏度,如果可能的话,标题是《冷杉温度》和《德克利尼奥》——最后一个随着温度的上升或下降而变化

这是网站的html部分:

THIS IS THE HTML (the part of matters:)
#<div class="ca-cidade"><a href="/site/internas/conteudo/meteorologia/grafico.shtml?id=23185109">Londrina</a></div>
<ul class="ca-condicoes">
<li class="ca-cond-firs"><img src="/site/imagens/icones_condicoes/temperatura/temp_baixa.png" title="Temperatura em declínio"/><br/>23.1°C</li>
<li class="ca-cond"><img src="/site/imagens/icones_condicoes/vento/L.png"/><br/>10 km/h</li>
<li class="ca-cond"><div class="ur">UR</div><br/>54%</li>
<li class="ca-cond"><img src="/site/imagens/icones_condicoes/chuva.png"/><br/>0.0 mm</li>

有什么帮助吗?

我不确定您的代码遇到了什么问题。在尝试使用您的代码时,我发现需要使用html解析器来成功解析网站。我还使用了soup.findAll来查找与所需类匹配的元素。希望以下内容能引导您找到答案:

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'html.parser')

rows = soup.findAll('li', {'class', 'ca-cond-firs'})
print rows

我不确定您的代码遇到了什么问题。在尝试使用您的代码时,我发现需要使用html解析器来成功解析网站。我还使用了soup.findAll来查找与所需类匹配的元素。希望以下内容能引导您找到答案:

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'html.parser')

rows = soup.findAll('li', {'class', 'ca-cond-firs'})
print rows
下面的代码可以根据需要在页面右侧获取温度的详细信息

result_table = soup.find('div', {'class':'ca-content-wrapper'})
print(result_table.text) # in your case there is no other div exist with class name ca-content-wrapper hence I can use it directly without iterating. you can use if condition to control which city temprature to print and which to not.
    # output will be like :
        # Apucarana

        # 21.5°C
        # 4 km/h
        # UR60%
        # 0.0 mm
下面的代码可以根据需要在页面右侧获取温度的详细信息

result_table = soup.find('div', {'class':'ca-content-wrapper'})
print(result_table.text) # in your case there is no other div exist with class name ca-content-wrapper hence I can use it directly without iterating. you can use if condition to control which city temprature to print and which to not.
    # output will be like :
        # Apucarana

        # 21.5°C
        # 4 km/h
        # UR60%
        # 0.0 mm

给你。您可以根据图标名称自定义该风对象

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup
import requests

def get_weather_data():

    URL = 'http://www.simepar.br/site/index.shtml'

    rawhtml = requests.get(URL).text
    soup = BeautifulSoup(rawhtml, 'html.parser')

    cities = soup.find('div', {"class":"ca-content-wrapper"})

    weather_data = []

    for city in cities.findAll("div", {"class":"ca-bg"}):

        name = city.find("div", {"class":"ca-cidade"}).text
        temp = city.find("li", {"class":"ca-cond-firs"}).text

        conditons = city.findAll("li", {"class":"ca-cond"})

        weather_data.append({
            "city":name,
            "temp":temp,
            "conditions":[{
                "wind":conditons[0].text +" "+what_wind(conditons[0].find("img")["src"]),
                "humidity":conditons[1].text,
                "raind":conditons[2].text,
            }]
        })


    return weather_data

def what_wind(img):
    if img.find ("NE"):
        return "From North East"

    if img.find ("O"):
        return "From West"

    if img.find ("N"):
        return "From North"

    #you can add other icons here


print get_weather_data()

这是该网站的所有天气数据。

给你。您可以根据图标名称自定义该风对象

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup
import requests

def get_weather_data():

    URL = 'http://www.simepar.br/site/index.shtml'

    rawhtml = requests.get(URL).text
    soup = BeautifulSoup(rawhtml, 'html.parser')

    cities = soup.find('div', {"class":"ca-content-wrapper"})

    weather_data = []

    for city in cities.findAll("div", {"class":"ca-bg"}):

        name = city.find("div", {"class":"ca-cidade"}).text
        temp = city.find("li", {"class":"ca-cond-firs"}).text

        conditons = city.findAll("li", {"class":"ca-cond"})

        weather_data.append({
            "city":name,
            "temp":temp,
            "conditions":[{
                "wind":conditons[0].text +" "+what_wind(conditons[0].find("img")["src"]),
                "humidity":conditons[1].text,
                "raind":conditons[2].text,
            }]
        })


    return weather_data

def what_wind(img):
    if img.find ("NE"):
        return "From North East"

    if img.find ("O"):
        return "From West"

    if img.find ("N"):
        return "From North"

    #you can add other icons here


print get_weather_data()

这是该网站提供的所有天气数据。

您应该试用BS4中的CSS3选择器,我个人认为它比查找和查找所有数据容易得多

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'lxml')

# soup.select returns the list of all the elements that matches the CSS3 selector

# get the text inside each <a> tag inside div.ca-cidade
cities = [cityTag.text for cityTag in soup.select("div.ca-cidade > a")] 

# get the temperature inside each li.ca-cond-firs
temps = [tempTag.text for tempTag in soup.select("li.ca-cond-firs")]

# get the temperature status inside each li.ca-cond-firs > img title attibute
tempStatus = [tag["title"] for tag in soup.select("li.ca-cond-firs > img")]

# len(cities) == len(temps) == len(tempStatus) => This is normally true.

for i in range(len(cities)):
    print("City: {}, Temperature: {}, Status: {}.".format(cities[i], temps[i], tempStatus[i]))

你应该试试BS4中的CSS3选择器,我个人觉得它比“查找”和“全部查找”更容易使用

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'lxml')

# soup.select returns the list of all the elements that matches the CSS3 selector

# get the text inside each <a> tag inside div.ca-cidade
cities = [cityTag.text for cityTag in soup.select("div.ca-cidade > a")] 

# get the temperature inside each li.ca-cond-firs
temps = [tempTag.text for tempTag in soup.select("li.ca-cond-firs")]

# get the temperature status inside each li.ca-cond-firs > img title attibute
tempStatus = [tag["title"] for tag in soup.select("li.ca-cond-firs > img")]

# len(cities) == len(temps) == len(tempStatus) => This is normally true.

for i in range(len(cities)):
    print("City: {}, Temperature: {}, Status: {}.".format(cities[i], temps[i], tempStatus[i]))

哦,城市名称的编码是错误的-你可能想用bytescity\u名称替换它,编码='latin1'。打印时解码'utf-8'。你的代码工作正常,但它可以预测当天的最低/最高温度。我需要15分钟延迟的当前值。如果您看到www.simepar.br,它就是网站的正确部分。城市是Londrina。哦,城市名称的编码是错误的-你可能想用bytescity\u名称替换它,编码='latin1'。打印时解码'utf-8'。你的编码工作正常,但它可以预测当天的最低/最高温度。我需要15分钟延迟的当前值。如果您看到www.simepar.br,它就是网站的正确部分。这个城市是Londrina。很好…但是我得到了一个错误:reloadsys name错误:name'reload'没有定义您使用的python版本是什么?试试这个:。我把它放在这里,这样你就不会在非UTF8字符上出错。我使用的是Python2.7。这是关于重新加载的答案:很好…但我得到了一个错误:reloadsys name错误:name'reload'未定义您使用的python版本是什么?试试这个:。我把它放在这里,这样你就不会在非UTF8字符上出错。我使用的是Python2.7。这是你重新加载时的答案:太棒了!工作起来像个傻瓜!如果我想孤立一个城市,你能告诉我该怎么做才能只打印一个城市吗?如果你知道你要找的城市存在于城市列表中,你可以试试这个:printCity:{},tempStatus:{},Status:{}..formatcities[cities.indexLondrina],temps[cities.indexLondrina],tempStatus[cities.indexLondrina]太棒了!工作起来像个傻瓜!如果我想孤立一个城市,你能告诉我该怎么做才能只打印一个城市吗?如果你知道你要寻找的城市存在于城市列表中,你可以试试这个:printCity:{},tempStatus:{},Status:{}..formatcities[cities.indexLondrina],temps[cities.indexLondrina],tempStatus[cities.indexLondrina]