Web scraping 美丽的肥皂刮擦内容
有没有办法把数字(13)排在最后 我尝试了以下代码:Web scraping 美丽的肥皂刮擦内容,web-scraping,beautifulsoup,Web Scraping,Beautifulsoup,有没有办法把数字(13)排在最后 我尝试了以下代码: url='https://mgm.gov.tr/?il=Ankara' req = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) web_page = urlopen(req).read() soup = BeautifulSoup(web_page, 'html.parser') mydivs = soup.find_all("div", {"class": "tahminM
url='https://mgm.gov.tr/?il=Ankara'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_page = urlopen(req).read()
soup = BeautifulSoup(web_page, 'html.parser')
mydivs = soup.find_all("div", {"class": "tahminMax"})[0]
mydivs
并收到以下输出:
<div class="tahminMax"><span class="deger" ng-bind="gunlukTahmin[0].enYuksekGun1 | kaliteKontrol"></span><span class="derece">°C</span></div>
°C
站点由加载站点后加载的JS
事件处理。下面您可以使用selenium
实现您的目标
from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
url = 'https://mgm.gov.tr/?il=Ankara'
sada = browser.get(url)
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')
for tag in soup.findAll("div", attrs={"class": "tahminMax"}):
for span in tag.findAll('span', attrs={'class': 'deger ng-binding'}):
print(span.text)
browser.close()
还有beautifulsou
正在执行任务,但是13
的输出将不会加载
from bs4 import BeautifulSoup
import requests
r = requests.get('https://mgm.gov.tr/?il=Ankara')
time.sleep(3)
soup = BeautifulSoup(r.text, 'html.parser')
for tag in soup.findAll("div", attrs={"class": "tahminMax"}):
for span in tag.findAll('span', attrs={'class': 'deger', 'ng-bind': True}):
print(span.text)
值是从另一个xhr调用中动态检索的,您可以在网络选项卡中找到。您可以按如下方式提取它们:
import requests
headers = {'Origin': 'https://mgm.gov.tr'}
r = requests.get('https://servis.mgm.gov.tr/web/tahminler/saatlik?istno=17130', headers=headers).json()
d = {i['tarih']:i['maksimumRuzgarHizi'] for i in r[0]['tahmin']}
print(d)