Python urllib.robotparser.RobotFileParser（）每次运行时都会给出不同的结果-http状态？_Python_Robots.txt

Python urllib.robotparser.RobotFileParser（）每次运行时都会给出不同的结果-http状态？

python

Python urllib.robotparser.RobotFileParser（）每次运行时都会给出不同的结果-http状态？,python,robots.txt,Python,Robots.txt,urllib.robotparser.RobotFileParser（）每次运行时都会给我不同的结果这样说-不允许/search.htm* # robots.txt for https://www.alza.cz/ User-Agent: * Disallow: /Order1.htm Disallow: /Order2.htm Disallow: /Order3.htm Disallow: /Order4.htm Disallow: /Order5.htm Disallow: /downl

urllib.robotparser.RobotFileParser（）每次运行时都会给我不同的结果

这样说-不允许

/search.htm*

# robots.txt for https://www.alza.cz/

User-Agent: *
Disallow: /Order1.htm
Disallow: /Order2.htm
Disallow: /Order3.htm
Disallow: /Order4.htm
Disallow: /Order5.htm
Disallow: /download/
Disallow: /muj-ucet/
Disallow: /Secure/
Disallow: /LostPassword.htm
Disallow: /search.htm*

Sitemap: https://www.alza.cz/_sitemap-categories.xml
Sitemap: https://www.alza.cz/_sitemap-categories-producers.xml
Sitemap: https://www.alza.cz/_sitemap-live-product.xml
Sitemap: https://www.alza.cz/_sitemap-dead-product.xml
Sitemap: https://www.alza.cz/_sitemap-before_listing.xml
Sitemap: https://www.alza.cz/_sitemap-seo-sorted-categories.xml
Sitemap: https://www.alza.cz/_sitemap-bazaar-categories.xml
Sitemap: https://www.alza.cz/_sitemap-sale-categories.xml
Sitemap: https://www.alza.cz/_sitemap-parametrically-generated-pages.xml
Sitemap: https://www.alza.cz/_sitemap-parametrically-generated-pages-producer.xml
Sitemap: https://www.alza.cz/_sitemap-articles.xml
Sitemap: https://www.alza.cz/_sitemap-producers.xml
Sitemap: https://www.alza.cz/_sitemap-econtent.xml
Sitemap: https://www.alza.cz/_sitemap-dead-econtent.xml
Sitemap: https://www.alza.cz/_sitemap-branch-categories.xml
Sitemap: https://www.alza.cz/_sitemap-installments.xml
Sitemap: https://www.alza.cz/_sitemap-detail-page-slots-of-accessories.xml
Sitemap: https://www.alza.cz/_sitemap-reviews.xml
Sitemap: https://www.alza.cz/_sitemap-detail-page-bazaar.xml
Sitemap: https://www.alza.cz/_sitemap-productgroups.xml
Sitemap: https://www.alza.cz/_sitemap-accessories.xml

然而，当我第一次运行以下内容时，我得到了FALSE（这是正确的），但现在每次运行它我都得到了TRUE（这是不正确的）：

从源代码中找到了这段代码，它表明服务器响应的http状态代码介于400和499之间，这确实很奇怪，不幸的是我自己无法检查

def read(self):
    """Reads the robots.txt URL and feeds it to the parser."""
    try:
        f = urllib.request.urlopen(self.url)
    except urllib.error.HTTPError as err:
        if err.code in (401, 403):
            self.disallow_all = True
        elif err.code >= 400 and err.code < 500:
            self.allow_all = True
    else:
        raw = f.read()
        self.parse(raw.decode("utf-8").splitlines())

    # Until the robots.txt file has been read or found not
    # to exist, we must assume that no url is allowable.
    # This prevents false positives when a user erroneously
    # calls can_fetch() before calling read().

def读取（自）：
“”“读取robots.txt URL并将其提供给解析器。”“”
尝试：
f=urllib.request.urlopen（self.url）
除了urllib.error.HTTPError作为错误：
如果（401403）中有错误代码：
self.disallow\u all=True
elif err.code>=400且err.code<500：
self.allow_all=True
其他：
raw=f.read（）
self.parse（原始解码（“utf-8”）.splitlines（））
#直到robots.txt文件被读取或找不到为止
#为了存在，我们必须假设不允许url。
#这可以防止当用户错误地
#调用read（）之前，可以先调用fetch（）。

关于可能发生的事情有什么想法吗

编辑：我更新的源代码和没有坏的状态，它给200。我不明白为什么要给这个url一个通行证

def read(self):
    """Reads the robots.txt URL and feeds it to the parser."""
    try:
        f = urllib.request.urlopen(self.url)
    except urllib.error.HTTPError as err:
        if err.code in (401, 403):
            self.disallow_all = True
        elif err.code >= 400 and err.code < 500:
            self.allow_all = True
    else:
        raw = f.read()
        self.parse(raw.decode("utf-8").splitlines())

    # Until the robots.txt file has been read or found not
    # to exist, we must assume that no url is allowable.
    # This prevents false positives when a user erroneously
    # calls can_fetch() before calling read().