Web scraping 美丽的肥皂刮擦内容

Web scraping 美丽的肥皂刮擦内容,web-scraping,beautifulsoup,Web Scraping,Beautifulsoup,有没有办法把数字(13)排在最后 我尝试了以下代码: url='https://mgm.gov.tr/?il=Ankara' req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) web_page = urlopen(req).read() soup = BeautifulSoup(web_page, 'html.parser') mydivs = soup.find_all("div", {"class": "tahminM

有没有办法把数字(13)排在最后

我尝试了以下代码:

url='https://mgm.gov.tr/?il=Ankara'

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

web_page = urlopen(req).read()

soup = BeautifulSoup(web_page, 'html.parser')

mydivs = soup.find_all("div", {"class": "tahminMax"})[0]

mydivs
并收到以下输出:

<div class="tahminMax"><span class="deger" ng-bind="gunlukTahmin[0].enYuksekGun1 | kaliteKontrol"></span><span class="derece">°C</span></div>
°C

站点由加载站点后加载的
JS
事件处理。下面您可以使用
selenium
实现您的目标

from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Firefox()

url = 'https://mgm.gov.tr/?il=Ankara'
sada = browser.get(url)

source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')


for tag in soup.findAll("div", attrs={"class": "tahminMax"}):
    for span in tag.findAll('span', attrs={'class': 'deger ng-binding'}):
        print(span.text)
browser.close()

还有
beautifulsou
正在执行任务,但是
13
的输出将不会加载

from bs4 import BeautifulSoup
import requests

r = requests.get('https://mgm.gov.tr/?il=Ankara')
time.sleep(3)
soup = BeautifulSoup(r.text, 'html.parser')


for tag in soup.findAll("div", attrs={"class": "tahminMax"}):
    for span in tag.findAll('span', attrs={'class': 'deger', 'ng-bind': True}):
      print(span.text)

值是从另一个xhr调用中动态检索的,您可以在网络选项卡中找到。您可以按如下方式提取它们:

import requests

headers = {'Origin': 'https://mgm.gov.tr'}
r = requests.get('https://servis.mgm.gov.tr/web/tahminler/saatlik?istno=17130', headers=headers).json()
d = {i['tarih']:i['maksimumRuzgarHizi'] for i in r[0]['tahmin']}
print(d)