Python 如何从beautifulsoup获取完整链接,而不是仅获取内部链接
我是python新手。我正在为我工作的公司建造一个爬虫。爬行其网站,有一个内部链接,是不是在链接格式,它是用来。如何获取整个链接而不是仅获取目录。如果我不太清楚,请运行我编写的代码:Python 如何从beautifulsoup获取完整链接,而不是仅获取内部链接,python,web-scraping,beautifulsoup,web-crawler,Python,Web Scraping,Beautifulsoup,Web Crawler,我是python新手。我正在为我工作的公司建造一个爬虫。爬行其网站,有一个内部链接,是不是在链接格式,它是用来。如何获取整个链接而不是仅获取目录。如果我不太清楚,请运行我编写的代码: import urllib2 from bs4 import BeautifulSoup web_page_string = [] def get_first_page(seed): response = urllib2.urlopen(seed) web_page = response.rea
import urllib2
from bs4 import BeautifulSoup
web_page_string = []
def get_first_page(seed):
response = urllib2.urlopen(seed)
web_page = response.read()
soup = BeautifulSoup(web_page)
for link in soup.find_all('a'):
print (link.get('href'))
print soup
print get_first_page('http://www.fashionroom.com.br')
print web_page_string
我试着在剧本中加入一个if,向大家询问答案。如果有人发现我将来会发现的东西有潜在问题,请告诉我
import urllib2
from bs4 import BeautifulSoup
web_page_string = []
def get_first_page(seed):
response = urllib2.urlopen(seed)
web_page = response.read()
soup = BeautifulSoup(web_page)
final_page_string = soup.get_text()
for link in soup.find_all('a'):
if (link.get('href'))[0:4]=='http':
print (link.get('href'))
else:
print seed+'/'+(link.get('href'))
print final_page_string
print get_first_page('http://www.fashionroom.com.br')
print web_page_string
整个链接是什么意思?
打印种子+'/'+link.get('href')
?我想htt://www.fashionroom.com.br/indexnew.html 在上述情况下。相反,我只是得到了indexnew.html