Python 如何从beautifulsoup获取完整链接，而不是仅获取内部链接_Python_Web Scraping_Beautifulsoup_Web Crawler

Python 如何从beautifulsoup获取完整链接，而不是仅获取内部链接

python web-scraping web-crawler

Python 如何从beautifulsoup获取完整链接，而不是仅获取内部链接,python,web-scraping,beautifulsoup,web-crawler,Python,Web Scraping,Beautifulsoup,Web Crawler,我是python新手。我正在为我工作的公司建造一个爬虫。爬行其网站，有一个内部链接，是不是在链接格式，它是用来。如何获取整个链接而不是仅获取目录。如果我不太清楚，请运行我编写的代码： import urllib2 from bs4 import BeautifulSoup web_page_string = [] def get_first_page(seed): response = urllib2.urlopen(seed) web_page = response.rea

我是python新手。我正在为我工作的公司建造一个爬虫。爬行其网站，有一个内部链接，是不是在链接格式，它是用来。如何获取整个链接而不是仅获取目录。如果我不太清楚，请运行我编写的代码：

import urllib2
from bs4 import BeautifulSoup

web_page_string = []

def get_first_page(seed):
    response = urllib2.urlopen(seed)
    web_page = response.read()
    soup = BeautifulSoup(web_page)
    for link in soup.find_all('a'):
        print (link.get('href'))
    print soup


print get_first_page('http://www.fashionroom.com.br')
print web_page_string

我试着在剧本中加入一个if，向大家询问答案。如果有人发现我将来会发现的东西有潜在问题，请告诉我

import urllib2
from bs4 import BeautifulSoup

web_page_string = []

def get_first_page(seed):
    response = urllib2.urlopen(seed)
    web_page = response.read()
    soup = BeautifulSoup(web_page)
    final_page_string = soup.get_text()
    for link in soup.find_all('a'):
        if (link.get('href'))[0:4]=='http':
            print (link.get('href'))
        else:
            print seed+'/'+(link.get('href'))
    print final_page_string


print get_first_page('http://www.fashionroom.com.br')
print web_page_string

整个链接是什么意思？

打印种子+'/'+link.get（'href'）

？我想htt://www.fashionroom.com.br/indexnew.html 在上述情况下。相反，我只是得到了indexnew.html