Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-传递URL与HTTPResponse对象_Python_Url_Beautifulsoup - Fatal编程技术网

Python-传递URL与HTTPResponse对象

Python-传递URL与HTTPResponse对象,python,url,beautifulsoup,Python,Url,Beautifulsoup,我有一个URL列表,我想从中获取一个属性。Python新手,请原谅。Windows7,64位。Python 3.2 下面的代码可以工作。pblist是由DICT组成的列表,其中包括键“short_url” for j in pblist[0:10]: base_url = j['short_url'] if hasattr(BeautifulSoup(urllib.request.urlopen(base_url)), 'head') and \ hasattr(

我有一个URL列表,我想从中获取一个属性。Python新手,请原谅。Windows7,64位。Python 3.2

下面的代码可以工作。pblist是由DICT组成的列表,其中包括键“short_url”

for j in pblist[0:10]:
    base_url = j['short_url']
    if hasattr(BeautifulSoup(urllib.request.urlopen(base_url)), 'head') and \
        hasattr(BeautifulSoup(urllib.request.urlopen(base_url)).head, 'title'):
            print("Has head, title attributes.")
            try:
                j['title'] = BeautifulSoup(urllib.request.urlopen(base_url)).head.title.string.encode('utf-8')
            except AttributeError:
                print("Encountered attribute error on page, ", base_url)
                j['title'] = "Attribute error."
                pass
下面的代码没有——例如,该代码声明BeautifulSoup对象没有head和title属性

for j in pblist[0:10]:
        base_url = j['short_url']
        page = urllib.request.urlopen(base_url)
        if hasattr(BeautifulSoup(page), 'head') and \
            hasattr(BeautifulSoup(page).head, 'title'):
                print("Has head, title attributes.")
                try:
                    j['title'] = BeautifulSoup(urllib.request.urlopen(base_url)).head.title.string.encode('utf-8')
                except AttributeError:
                    print("Encountered attribute error on page, ", base_url)
                    j['title'] = "Attribute error."
                    pass

为什么??在BeautifulSoup中将url传递给urllib.request.urlopen与传递urllib.request.urlopen返回的HTTPResponse ojbect之间有什么区别?

urlopen()
提供的响应是一个类似文件的对象,这意味着默认情况下它的行为有点像迭代器——也就是说,一旦您阅读一次,您将无法从中获取更多数据(除非您显式重置它)

因此,在第二个版本中,
BeautifulSoup(page)
的第一次调用读取了
page
中的所有数据,随后的调用没有更多的数据可读取

相反,您可以做的是:

page = urllib.request.urlopen(base_url)
page_content = page.read()
# ...
BeautifulSoup(page_content)
# ...
BeautifulSoup(page_content)
但即使这样也有点低效。相反,为什么不制作一个BeautifulSoup对象并传递它呢

page = urllib.request.urlopen(base_url)
soup = BeautifulSoup(page)
# ...
# do something with soup
# ...
# do something with soup

您的代码已修改为使用单个汤对象:

for j in pblist[0:10]:
        base_url = j['short_url']
        page = urllib.request.urlopen(base_url)
        soup = BeautifulSoup(page)
        if hasattr(soup, 'head') and \
            hasattr(soup.head, 'title'):
                print("Has head, title attributes.")
                try:
                    j['title'] = soup.head.title.string.encode('utf-8')
                except AttributeError:
                    print("Encountered attribute error on page, ", base_url)
                    j['title'] = "Attribute error."
                    pass