Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将URL复制到包含特定术语的文件_Python_Python 2.7_Web Crawler_Urllib2 - Fatal编程技术网

Python 将URL复制到包含特定术语的文件

Python 将URL复制到包含特定术语的文件,python,python-2.7,web-crawler,urllib2,Python,Python 2.7,Web Crawler,Urllib2,因此,我尝试获取该范围内的所有URL,这些URL的页面包含术语“Recipes adapted from”或“Recipe from”。这将复制到该文件的所有链接,直到大约7496,然后抛出HTTPError 404。我做错了什么?我已尝试实现BeautifulSoup和请求,但仍然无法使其工作 import urllib2 with open('recipes.txt', 'w+') as f: for i in range(14477): url = "http://

因此,我尝试获取该范围内的所有URL,这些URL的页面包含术语“Recipes adapted from”或“Recipe from”。这将复制到该文件的所有链接,直到大约7496,然后抛出HTTPError 404。我做错了什么?我已尝试实现BeautifulSoup和请求,但仍然无法使其工作

import urllib2
with open('recipes.txt', 'w+') as f:
    for i in range(14477):
        url = "http://www.tastingtable.com/entry_detail/{}".format(i)
        page_content = urllib2.urlopen(url).read()
        if "Recipe adapted from" in page_content:
            print url
            f.write(url + '\n')
        elif "Recipe from" in page_content:
            print url
            f.write(url + '\n')
        else:
            pass

您试图获取的某些URL不存在。通过忽略异常,简单地跳过:

import urllib2
with open('recipes.txt', 'w+') as f:
    for i in range(14477):
        url = "http://www.tastingtable.com/entry_detail/{}".format(i)
        try:
            page_content = urllib2.urlopen(url).read()
        except urllib2.HTTPError as error:
            if 400 < error.code < 500:
                continue  # not found, unauthorized, etc.
            raise   # other errors we want to know about
        if "Recipe adapted from" in page_content or "Recipe from" in page_content:
            print url
            f.write(url + '\n')
导入urllib2
将open('recipes.txt',w+')作为f:
对于范围内的i(14477):
url=”http://www.tastingtable.com/entry_detail/{}格式(一)
尝试:
page_content=urllib2.urlopen(url.read)()
除了urllib2.HTTPError作为错误外:
如果400<错误代码<500:
继续#未找到、未经授权等。
提出我们想知道的其他错误
如果第\页内容中的“配方改编自”或第\页内容中的“配方来源”:
打印url
f、 写入(url+“\n”)