Python 验证文件中是否存在URL_Python

Python 验证文件中是否存在URL

python

Python 验证文件中是否存在URL,python,Python,所以我有一些代码，我用它在邮箱中搜索特定的URL。完成后，它将创建一个名为links.txt的文件我想对该文件运行一个脚本，以获得该列表中所有当前URL的输出。我的脚本只允许我一次检查URL import urllib2 for url in ["www.google.com"]: try: connection = urllib2.urlopen(url) print connection.getcode() connection.

所以我有一些代码，我用它在邮箱中搜索特定的URL。完成后，它将创建一个名为links.txt的文件

我想对该文件运行一个脚本，以获得该列表中所有当前URL的输出。我的脚本只允许我一次检查URL

import urllib2

for url in ["www.google.com"]:

    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

使用请求：

import requests

with open(filename) as f:
    good_links = []
    for link in file:
        try:
            r = requests.get(link.strip())
        except Exception:
            continue
        good_links.append(r.url) #resolves redirects

您还可以考虑提取请求的调用。进入辅助函数：

def make_request(method, url, **kwargs):
    for i in range(10):
        try:
            r = requests.request(method, url, **kwargs)
            return r
        except requests.ConnectionError as e:
            print e.message
        except requests.HTTPError as e:
            print e.message
        except requests.RequestException as e:
            print e.message
    raise Exception("requests did not succeed")

使用请求：

import requests

with open(filename) as f:
    good_links = []
    for link in file:
        try:
            r = requests.get(link.strip())
        except Exception:
            continue
        good_links.append(r.url) #resolves redirects

您还可以考虑提取请求的调用。进入辅助函数：

def make_request(method, url, **kwargs):
    for i in range(10):
        try:
            r = requests.request(method, url, **kwargs)
            return r
        except requests.ConnectionError as e:
            print e.message
        except requests.HTTPError as e:
            print e.message
        except requests.RequestException as e:
            print e.message
    raise Exception("requests did not succeed")

考虑到您已经在一个URL列表上进行迭代，进行此更改很简单：

import urllib2

for url in open("urllist.txt"):   # change 1

    try:
        connection = urllib2.urlopen(url.rstrip())   # change 2
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

对文件进行迭代将返回文件的行（包括行尾）。我们在URL上使用

rstrip（）

来去除行尾

您还可以进行其他改进。例如，有些人会建议您将

与

一起使用，以确保文件已关闭。这是一个很好的实践，但在本脚本中可能没有必要进行更改。

考虑到您已经在一个URL列表上进行了迭代，所以进行此更改很简单：

import urllib2

for url in open("urllist.txt"):   # change 1

    try:
        connection = urllib2.urlopen(url.rstrip())   # change 2
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

对文件进行迭代将返回文件的行（包括行尾）。我们在URL上使用

rstrip（）

来去除行尾

您还可以进行其他改进。例如，有些人会建议您将

与

一起使用，以确保文件已关闭。这是一个很好的实践，但在本脚本中可能没有必要。

因此，您只需要知道如何从文本文件中读取行？因此，您只需要知道如何从文本文件中读取行？#/usr/bin/python为打开的url导入urllib2（“ZeuS_链接”）：#更改1 try:connection=urllib2.urlopen（url.rstrip（））#更改2 print connection.getcode（）connection.close（），urllib2.HTTPError除外，e:print e.getcode（）感谢你的帮助脚本现在正在工作唯一我想知道的是现在不是200或401只推出实际的url我会继续使用它和更新我去再次感谢你的帮助你已经有

url

作为你试图检索的url；打印出来就行了。有时候小事情就在你面前。非常感谢。太棒了/usr/bin/python为打开的url导入urllib2（“ZeuS_链接”）：#更改1 try:connection=urllib2.urlopen（url.rstrip（））#更改2 print connection.getcode（）connection.close（），urllib2.HTTPError除外，e:print e.getcode（）感谢你的帮助脚本现在正在工作唯一我想知道的是现在不是200或401只推出实际的url我会继续使用它和更新我去再次感谢你的帮助你已经有

url

作为你试图检索的url；打印出来就行了。有时候小事情就在你面前。非常感谢。真棒真棒