Python 如何使用beautifulsoup在UL类标记中查找li元素?
如何使用beautifulsoup在Python 如何使用beautifulsoup在UL类标记中查找li元素?,python,beautifulsoup,Python,Beautifulsoup,如何使用beautifulsoup在类标记中找到元素 页面位于url参数内。一旦您知道需要浏览多少页,只需反复浏览: import requests import math page = 1 api_url = 'https://www.capitaland.com/apis/sg/en/properties/rafflescitysingaporeshoppingcentre/cl%3Aentity/tenants/cl%3Arelated-page/%2Fcontent%2Fcapi
类标记中找到
元素
-
页面位于url参数内。一旦您知道需要浏览多少页,只需反复浏览:
import requests
import math
page = 1
api_url = 'https://www.capitaland.com/apis/sg/en/properties/rafflescitysingaporeshoppingcentre/cl%3Aentity/tenants/cl%3Arelated-page/%2Fcontent%2Fcapitaland%2Fsg%2Fmalls%2Frafflescity%2Fen%2Fstores/cl%3Asortby/jcr%3Atitle/asc/cl%3Aselectors/_rel_brandtenants_details/_rel_deals/_rel_properties_details/_rel_tenants_details/accepts/acceptsCapita3Eats/acceptsCapitacard/acceptsCapitavoucher/acceptschope/acceptseCapitavoucher/addressroadname/assettype/brand/capita3EatsLink/chopelink/city/country/countryCode/cq%3Atags/currency/dealExisted/enddate/endtime/entityType/entityname/firstPublished/jcr%3Atitle/listingTypePages/logoImgPath/malllocationnote/marketingcategory/nearesttrainstation/oldprice/pagePath/pageTitle/price/promotiontype/ribbon/ribboncolor/shortdescription/startdate/starttime/state/subtitle/thumbnail/tileColorScheme/tilesubtext/cl%3Apgcursor/{page}/16.json'.format(page=page)
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
jsonData = requests.get(api_url, headers=headers, verify = False).json()
total_pages = math.ceil(jsonData['totalcount'] / 16)
links = []
for page in range(1,total_pages+1):
print('Page: %s of %s' %(page,total_pages))
if page == 1:
pass
else:
api_url = 'https://www.capitaland.com/apis/sg/en/properties/rafflescitysingaporeshoppingcentre/cl%3Aentity/tenants/cl%3Arelated-page/%2Fcontent%2Fcapitaland%2Fsg%2Fmalls%2Frafflescity%2Fen%2Fstores/cl%3Asortby/jcr%3Atitle/asc/cl%3Aselectors/_rel_brandtenants_details/_rel_deals/_rel_properties_details/_rel_tenants_details/accepts/acceptsCapita3Eats/acceptsCapitacard/acceptsCapitavoucher/acceptschope/acceptseCapitavoucher/addressroadname/assettype/brand/capita3EatsLink/chopelink/city/country/countryCode/cq%3Atags/currency/dealExisted/enddate/endtime/entityType/entityname/firstPublished/jcr%3Atitle/listingTypePages/logoImgPath/malllocationnote/marketingcategory/nearesttrainstation/oldprice/pagePath/pageTitle/price/promotiontype/ribbon/ribboncolor/shortdescription/startdate/starttime/state/subtitle/thumbnail/tileColorScheme/tilesubtext/cl%3Apgcursor/{page}/16.json'.format(page=page)
jsonData = requests.get(api_url, headers=headers, verify = False).json()
properties = jsonData['properties']
for each in properties:
pagePath = each['pagePath']
links.append(pagePath)
print(links)
输出:
240链接
['https://www.capitaland.com/sg/malls/rafflescity/en/stores/chewy-junior', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/a-one-signature', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/aesop', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/aldo',... 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/xing-ji-big-prawn-noodle-opening-soon', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/xw-western-grill', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/ya-kun-kaya-toast', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/ysl-beauty', 'https://www.capitaland.com/sg/malls/rafflescity/en/stores/_house_yamamoto']
它是从https://www.capitaland.com/apis/........r/1/16.json
您可以在浏览器的“网络”选项卡中找到要调用的API的完整url。其他人的开始链接是:https://www.capitaland.com/sg/malls/rafflescity/en/stores.html?category=foodandbeverage
我在页面中找不到所有商户的所有URL,有没有办法拉取页面中的所有商户链接?这意味着页面一次只能加载16个商户,所以我必须重复很多次才能得到所有的商家链接?可能吧。检查API调用中的limit参数。或者,如果页面要求您加载更多/has分页,则为我选择并监视网络选项卡以获取另一个API调用。“页面”是指API调用吗?