如何在python beautifulsoup中获取下一页链接?
我有这个链接:如何在python beautifulsoup中获取下一页链接?,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我有这个链接: http://www.brothersoft.com/windows/categories.html 我正在尝试获取div中项目的链接。 例如: 我尝试过以下代码: import urllib from bs4 import BeautifulSoup url = 'http://www.brothersoft.com/windows/categories.html' pageHtml = urllib.urlopen(url).read() soup = Beautif
http://www.brothersoft.com/windows/categories.html
我正在尝试获取div中项目的链接。
例如:
我尝试过以下代码:
import urllib
from bs4 import BeautifulSoup
url = 'http://www.brothersoft.com/windows/categories.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]
for i in sAll:
print "http://www.brothersoft.com"+i['href']
但我只得到输出:
http://www.brothersoft.com/windows/mp3_audio/
如何获得所需的输出?Url
http://www.brothersoft.com/windows/mp3_audio/midi_tools/
不在标记
中,因此如果输出为http://www.brothersoft.com/windows/mp3_audio/
,没错
如果您想获得所需的url,请更改
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brLeft'})]
到
更新:
获取“midi_工具”内部信息的示例
import urllib
from bs4 import BeautifulSoup
url = 'http://www.brothersoft.com/windows/categories.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]
for i in sAll:
suburl = "http://www.brothersoft.com"+i['href'] #which is a url like 'midi_tools'
content = urllib.urlopen(suburl).read()
anosoup = BeautifulSoup(content)
ablock = anosoup.find('table',{'id':'courseTab'})
for atr in ablock.findAll('tr',{'class':'border_bot '}):
print atr.find('dt').a.string #name
print "http://www.brothersoft.com" + atr.find('a',{'class':'tabDownload'})['href'] #link
工作完美,有什么问题吗?输出应该是如何获取midi_工具中的应用程序名称和链接?@wan mohd payed,这与您所做的类似,获取midi_工具页面的内容,并找出信息所在的标签,然后使用
BeautifulSoup
获取信息。@Davd.Zheng我需要使用“加入”还是什么?@wan mohd payed,对不起,我不明白你使用“加入”是什么意思?为了什么?对不起,我实际上不知道如何编码。但是我怎样才能获得midi_工具中的下载链接并打印一些有关midi_工具中软件的字符串信息呢
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]
import urllib
from bs4 import BeautifulSoup
url = 'http://www.brothersoft.com/windows/categories.html'
pageHtml = urllib.urlopen(url).read()
soup = BeautifulSoup(pageHtml)
sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':'brRight'})]
for i in sAll:
suburl = "http://www.brothersoft.com"+i['href'] #which is a url like 'midi_tools'
content = urllib.urlopen(suburl).read()
anosoup = BeautifulSoup(content)
ablock = anosoup.find('table',{'id':'courseTab'})
for atr in ablock.findAll('tr',{'class':'border_bot '}):
print atr.find('dt').a.string #name
print "http://www.brothersoft.com" + atr.find('a',{'class':'tabDownload'})['href'] #link