Python 正在尝试对配置文件URL的Yelp搜索结果页面进行爬网
我正在尝试使用BeautifulSoup从Yelp搜索结果页面中刮取个人资料URL。这是我目前拥有的代码:Python 正在尝试对配置文件URL的Yelp搜索结果页面进行爬网,python,beautifulsoup,web-crawler,yelp,Python,Beautifulsoup,Web Crawler,Yelp,我正在尝试使用BeautifulSoup从Yelp搜索结果页面中刮取个人资料URL。这是我目前拥有的代码: url="https://www.yelp.com/search?find_desc=tree+-+removal+-+&find_loc=Baltimore+MD&start=40" response=requests.get(url) data=response.text soup = BeautifulSoup(data,'lxml') for a in so
url="https://www.yelp.com/search?find_desc=tree+-+removal+-+&find_loc=Baltimore+MD&start=40"
response=requests.get(url)
data=response.text
soup = BeautifulSoup(data,'lxml')
for a in soup.find_all('a', href=True):
with open(r'C:\Users\my.name\Desktop\Yelp-URLs.csv',"a") as f:
print(a,file=f)
这为我提供了页面上的每个href链接,而不仅仅是配置文件URL。另外,当我只需要业务概要URL时,我得到了完整的类字符串(一个类lemon…)
请提供帮助。您可以使用select缩小href限制
for a in soup.select('a[href^="/biz/"]'):
with open(r'/Users/my.name/Desktop/Yelp-URLs.csv',"a") as f:
print(a.attrs['href'], file=f)
指定要废弃的结果是否更好?