Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从beautifulsoup获取完整链接,而不是仅获取内部链接_Python_Web Scraping_Beautifulsoup_Web Crawler - Fatal编程技术网

Python 如何从beautifulsoup获取完整链接,而不是仅获取内部链接

Python 如何从beautifulsoup获取完整链接,而不是仅获取内部链接,python,web-scraping,beautifulsoup,web-crawler,Python,Web Scraping,Beautifulsoup,Web Crawler,我是python新手。我正在为我工作的公司建造一个爬虫。爬行其网站,有一个内部链接,是不是在链接格式,它是用来。如何获取整个链接而不是仅获取目录。如果我不太清楚,请运行我编写的代码: import urllib2 from bs4 import BeautifulSoup web_page_string = [] def get_first_page(seed): response = urllib2.urlopen(seed) web_page = response.rea

我是python新手。我正在为我工作的公司建造一个爬虫。爬行其网站,有一个内部链接,是不是在链接格式,它是用来。如何获取整个链接而不是仅获取目录。如果我不太清楚,请运行我编写的代码:

import urllib2
from bs4 import BeautifulSoup

web_page_string = []

def get_first_page(seed):
    response = urllib2.urlopen(seed)
    web_page = response.read()
    soup = BeautifulSoup(web_page)
    for link in soup.find_all('a'):
        print (link.get('href'))
    print soup


print get_first_page('http://www.fashionroom.com.br')
print web_page_string

我试着在剧本中加入一个if,向大家询问答案。如果有人发现我将来会发现的东西有潜在问题,请告诉我

import urllib2
from bs4 import BeautifulSoup

web_page_string = []

def get_first_page(seed):
    response = urllib2.urlopen(seed)
    web_page = response.read()
    soup = BeautifulSoup(web_page)
    final_page_string = soup.get_text()
    for link in soup.find_all('a'):
        if (link.get('href'))[0:4]=='http':
            print (link.get('href'))
        else:
            print seed+'/'+(link.get('href'))
    print final_page_string


print get_first_page('http://www.fashionroom.com.br')
print web_page_string

整个链接是什么意思?
打印种子+'/'+link.get('href')
?我想htt://www.fashionroom.com.br/indexnew.html 在上述情况下。相反,我只是得到了indexnew.html