Python 3.x 为什么在抓取网站后输出为空?
网站能否阻止python脚本扫描其中的值(通过BeautifulSoup) 我使用这个脚本Python 3.x 为什么在抓取网站后输出为空?,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,网站能否阻止python脚本扫描其中的值(通过BeautifulSoup) 我使用这个脚本 import gspread import requests from bs4 import BeautifulSoup URL = 'https://www.sreality.cz/hledani/prodej/byty/praha?velikost=1%2Bkk' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.pa
import gspread
import requests
from bs4 import BeautifulSoup
URL = 'https://www.sreality.cz/hledani/prodej/byty/praha?velikost=1%2Bkk'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}
response = requests.get(URL, headers=headers)
#Scraping webu eurobydleni.cz
results = soup.find_all('div', attrs={'class':'text-wrap'})
for job in results:
nemovitost = job.find('span', attrs={'class':'name ng-binding'})
nemovitost_final = nemovitost.text.strip()
print(nemovitost_final)
但产出什么都不是。脚本开始,然后快速结束
我需要打印Prodej bytu 1+kk 33 m²中的内容
所以输出='Prodej bytu 1+kk','Prodej bytu 1+kk',其他
编辑:使用@Andrej Kesely:
我尝试我们的代码(在我的代码中插入值到GoogleSheet),但我得到了错误
import gspread
import requests
import datetime
import json
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint
from datetime import timedelta
import time
datetime.datetime.now()
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
api_url = 'https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_sub_cb=2&category_type_cb=1&locality_region_id=10&per_page=20'
data = requests.get(api_url).json()
#komuniakce s Excelem
data = ServiceAccountCredentials.from_json_keyfile_name("data.json", scope)
client = gspread.authorize(data)
sheet = client.open("skript").worksheet('sreality.cz')
data = sheet.get_all_records()
#zapis do LOG
sheet2 = client.open("skript").worksheet('LOG')
data = sheet2.get_all_records()
insertRow = ["sreality.cz", "START: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
sheet2.insert_row(insertRow,2)
for estate in data["_embedded"]["estates"]:
insertRow = ["{:<30} {:<30} {} {}".format(estate["name"], estate["price"], estate["locality"])]
sheet.insert_row(insertRow,2)
insertRow = ["sreality.cz", "KONEC: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
sheet2.insert_row(insertRow,2)
time.sleep(60)
我需要:
Flat: Price: Address:
Prodej bytu 1+kk 23 m² 2827000 Římská, Praha 2 - Vinohrady
Prodej bytu 1+kk 27 m² 4049000 Ječná, Praha 2 - Nové Město
Prodej bytu 1+kk 33 m² 6005000 Záhřebská, Praha 2 - Vinohrady
数据通过Ajax从外部URL加载。您可以使用以下示例来说明如何加载数据:
import json
import requests
api_url = "https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_sub_cb=2&category_type_cb=1&locality_region_id=10&per_page=20"
data = requests.get(api_url).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for estate in data["_embedded"]["estates"]:
print("{:<30} {}".format(estate["name"], estate["price"]))
在edit2中,我请求帮助将值拆分为列,因此有我的最终解决方案:
insertRow = ['sreality.cz', "{:<30}".format(estate["name"]), "{:<30}".format(estate["locality"]), "{:<30}".format(estate["price"]), str(pocet_bytu)]
sheet.insert_row(insertRow,2)
insertRow=['sreality.cz',“{:感谢您帮助我编写这段代码。但是我需要它来逐个单元格(逐行)插入值cel进入谷歌工作表。我更新了我的问题-我遇到了错误。你知道如何解决吗?@Triliang123你对谷歌工作表和json数据使用了相同的名称data
。尝试更改data\u json=requests.get(api\u url.json()
,然后你可以在data\u json[“\u embedded”][“estates”中对estate执行:
在做了一些修改后,现在这项工作开始了。我看不到,所以谢谢。但我还有一个小问题-如何将这些行拆分为列?请参见我问题中的Edit2。
import json
import requests
api_url = "https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_sub_cb=2&category_type_cb=1&locality_region_id=10&per_page=20"
data = requests.get(api_url).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for estate in data["_embedded"]["estates"]:
print("{:<30} {}".format(estate["name"], estate["price"]))
Prodej bytu 1+kk 33 m² 4809347
Prodej bytu 1+kk 32 m² 5493000
Prodej bytu 1+kk 44 m² 6167000
Prodej bytu 1+kk 23 m² 2896000
Prodej bytu 1+kk 26 m² 3320000
Prodej bytu 1+kk 20 m² 2715000
Prodej bytu 1+kk 36 m² 3600000
Prodej bytu 1+kk 44 m² 4770000
Prodej bytu 1+kk 18 m² 3850000
Prodej bytu 1+kk 33 m² 5226000
Prodej bytu 1+kk 15 m² 2950000
Prodej bytu 1+kk 15 m² 2950000
Prodej bytu 1+kk 15 m² 2950000
Prodej bytu 1+kk 36 m² 5248000
Prodej bytu 1+kk 22 m² 3990000
Prodej bytu 1+kk 80 m² 6300000
Prodej bytu 1+kk 46 m² 6394000
Prodej bytu 1+kk 33 m² 3469000
Prodej bytu 1+kk 39 m² 5099000
Prodej bytu 1+kk 32 m² 4250000
Prodej bytu 1+kk 30 m² 4759000
insertRow = ['sreality.cz', "{:<30}".format(estate["name"]), "{:<30}".format(estate["locality"]), "{:<30}".format(estate["price"]), str(pocet_bytu)]
sheet.insert_row(insertRow,2)