如何使用python使用beautifulsoup检索文本_Python_Selenium_Beautifulsoup

如何使用python使用beautifulsoup检索文本

python selenium

如何使用python使用beautifulsoup检索文本,python,selenium,beautifulsoup,Python,Selenium,Beautifulsoup,我想使用beautifulsoup从HTML中获取repairsonwheelsrim-hub.com文本，请告诉我该如何操作。目前我正在使用 webadress = profilePageSource.select("span#offscreen a[href]")[0].get_text() <div class="biz-website"> <span class="offscreen">Business website</span> <a t

我想使用beautifulsoup从HTML中获取repairsonwheelsrim-hub.com文本，请告诉我该如何操作。目前我正在使用

webadress = profilePageSource.select("span#offscreen a[href]")[0].get_text()


<div class="biz-website">
<span class="offscreen">Business website</span>
<a target="_blank" href="/biz_redir?url=http%3A%2F%2Frepairsonwheelsrim-hub.com&src_bizid=8tY2YtXPk1rGO7sl43LH8A&cachebuster=1438073532&s=6b75d47d32b28eb8e50506859857b75e949d698cdbc47e9892cc2a3b43e480c2">repairsonwheelsrim-hub.com</a>
</div>

WebAddress=profilePageSource。选择（“span#屏幕外a[href]”[0]。获取文本（）
商业网站

这就是您想要的：

from bs4 import BeautifulSoup
text='<div class="biz-website"> <span class="offscreen">Business website</span> <a target="_blank" href="/biz_redir?url=http%3A%2F%2Frepairsonwheelsrim-hub.com&src_bizid=8tY2YtXPk1rGO7sl43LH8A&cachebuster=1438073532&s=6b75d47d32b28eb8e50506859857b75e949d698cdbc47e9892cc2a3b43e480c2">repairsonwheelsrim-hub.com</a> </div>'
soup = BeautifulSoup(text, 'html.parser')    
print soup.a.text

要循环浏览url的文本，请执行以下操作：

from bs4 import BeautifulSoup
text='<div class="biz-website"> <span class="offscreen">Business website</span> <a target="_blank" href="/biz_redir?url=http%3A%2F%2Frepairsonwheelsrim-hub.com&src_bizid=8tY2YtXPk1rGO7sl43LH8A&cachebuster=1438073532&s=6b75d47d32b28eb8e50506859857b75e949d698cdbc47e9892cc2a3b43e480c2">repairsonwheelsrim-hub.com</a> </div>'    
soup = BeautifulSoup(text, 'html.parser')   
for t in soup.findAll("a"):
    print t.text

输出：

repairsonwheelsrim-hub.com

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
a=requests.get("http://www.yelp.com/biz/scotts-pizza-tours-new-york")
text=a.content

soup = BeautifulSoup(text, 'html.parser')   
for t in soup.findAll(lambda tag: tag.name == 'a' and 'target' in tag.attrs):
    if "".join(t["target"]) in "_blank":
        print t.get_text()

scottspizzatours.com
scottspizzatours.com
scottspizzatours.com/pri…

但是如何在循环时检索所有URL。。该字段是每个页面上都会更改的html字段。@NicoleW。上面的内容将循环遍历所有a标记并打印它们，如果您想搜索新文本，您只需更改

soup=beautifulsou（**您的html文本**，'html.parser'）

每次更改html文本时，请查看此url www.yelp.com/biz/repairs-on-wheels-brooklyn我想收集车轮修理的url，请告诉我如何使用BS查看这些业务列表我可以访问所有列表并附加下一页这是我只想收集此url的第一个列表scottspizzatours.com