使用python3捕获链接和IP_Python_Python 3.x_Hyperlink_Timeout_Try Catch

使用python3捕获链接和IP

python python-3.x hyperlink

使用python3捕获链接和IP,python,python-3.x,hyperlink,timeout,try-catch,Python,Python 3.x,Hyperlink,Timeout,Try Catch,在论坛的帮助下，我制作了一个脚本，捕获了本页面所有主题的链接。这些主题包含代理列表。脚本如下所示： import urllib.request, re from bs4 import BeautifulSoup url = "https://www.inforge.net/xi/forums/liste-proxy.1118/" soup = BeautifulSoup(urllib.request.urlopen(url), "lxml") base = "https://www.info

在论坛的帮助下，我制作了一个脚本，捕获了本页面所有主题的链接。这些主题包含代理列表。脚本如下所示：

import urllib.request, re
from bs4 import BeautifulSoup

url = "https://www.inforge.net/xi/forums/liste-proxy.1118/"
soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")

base = "https://www.inforge.net/xi/"

for tag in soup.find_all("a", {"class":"PreviewTooltip"}):
    links = tag.get("href")
    final = [base + links]

final2 = urllib.request.urlopen(final)

for line in final2:
    ip = re.findall("(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3}):(?:[\d]{1,5})", line)
    ip = ip[3:-1]

for addr in ip:
    print(addr)

输出为：

Traceback (most recent call last):
  File "proxygen5.0.py", line 13, in <module>
    sourcecode = urllib.request.urlopen(final)
  File "/usr/lib/python3.5/urllib/request.py", line 162, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 456, in open
    req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'

回溯（最近一次呼叫最后一次）：
文件“proxygen5.0.py”，第13行，在
sourcecode=urllib.request.urlopen（最终版）
urlopen中的文件“/usr/lib/python3.5/urllib/request.py”，第162行
返回opener.open（url、数据、超时）
文件“/usr/lib/python3.5/urllib/request.py”，第456行，打开
请求超时=超时
AttributeError:“list”对象没有属性“timeout”

我知道问题出在：

final2=urllib.request.urlopen（final）

部分，但我不知道如何解决

如何打印ips？

此代码应该可以执行您想要的操作，并对其进行了注释，以便您能够理解所有段落：

import urllib.request, re
from bs4 import BeautifulSoup

url = "https://www.inforge.net/xi/forums/liste-proxy.1118/"
soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")

base = "https://www.inforge.net/xi/"

# Iterate over all the <a> tags
for tag in soup.find_all("a", {"class":"PreviewTooltip"}):
    # Get the link form the tag
    link = tag.get("href")
    # Compose the new link
    final = base + link

    print('Request to {}'.format(final))    # To know what we are doing
    # Download the 'final' link content
    result = urllib.request.urlopen(final)

    # For every line in the downloaded content
    for line in result:
        # Find one or more IP(s), here we need to convert lines to string because `bytes` objects are given
        ip = re.findall("(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3}):(?:[\d]{1,5})", str(line))
        # If one ore more IP(s) are found
        if ip:
            # Print them on separate line
            print('\n'.join(ip))

导入urllib.request，重新
从bs4导入BeautifulSoup
url=”https://www.inforge.net/xi/forums/liste-proxy.1118/"
soup=BeautifulSoup（urllib.request.urlopen（url），“lxml”）
基数=”https://www.inforge.net/xi/"
#迭代所有标记
对于soup.find_all（“a”，{“class”：“PreviewTooltip”}）中的标记：
#从标签中获取链接
link=tag.get（“href”）
#撰写新链接
最终=基础+连接
打印（{}.format（final））以了解我们正在做什么
#下载“最终”链接内容
结果=urllib.request.urlopen（最终）
#对于下载内容中的每一行
对于行输入结果：
#查找一个或多个IP，这里我们需要将行转换为字符串，因为给定了'bytes'对象
ip=re.findall（（？：[\d]{1,3}）\（？：[\d]{1,3}）\（？：[\d]{1,3}）\（？：[\d]{1,3}）：（？：[\d]{1,5}），str（line））
#如果找到一个或多个IP
如果是ip：
#在单独的行上打印它们
打印（'\n'.加入（ip））

这段代码应该做你想做的事情，它有注释，这样你就可以理解所有的段落：

import urllib.request, re
from bs4 import BeautifulSoup

url = "https://www.inforge.net/xi/forums/liste-proxy.1118/"
soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")

base = "https://www.inforge.net/xi/"

# Iterate over all the <a> tags
for tag in soup.find_all("a", {"class":"PreviewTooltip"}):
    # Get the link form the tag
    link = tag.get("href")
    # Compose the new link
    final = base + link

    print('Request to {}'.format(final))    # To know what we are doing
    # Download the 'final' link content
    result = urllib.request.urlopen(final)

    # For every line in the downloaded content
    for line in result:
        # Find one or more IP(s), here we need to convert lines to string because `bytes` objects are given
        ip = re.findall("(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3}):(?:[\d]{1,5})", str(line))
        # If one ore more IP(s) are found
        if ip:
            # Print them on separate line
            print('\n'.join(ip))

导入urllib.request，重新
从bs4导入BeautifulSoup
url=”https://www.inforge.net/xi/forums/liste-proxy.1118/"
soup=BeautifulSoup（urllib.request.urlopen（url），“lxml”）
基数=”https://www.inforge.net/xi/"
#迭代所有标记
对于soup.find_all（“a”，{“class”：“PreviewTooltip”}）中的标记：
#从标签中获取链接
link=tag.get（“href”）
#撰写新链接
最终=基础+连接
打印（{}.format（final））以了解我们正在做什么
#下载“最终”链接内容
结果=urllib.request.urlopen（最终）
#对于下载内容中的每一行
对于行输入结果：
#查找一个或多个IP，这里我们需要将行转换为字符串，因为给定了'bytes'对象
ip=re.findall（（？：[\d]{1,3}）\（？：[\d]{1,3}）\（？：[\d]{1,3}）\（？：[\d]{1,3}）：（？：[\d]{1,5}），str（line））
#如果找到一个或多个IP
如果是ip：
#在单独的行上打印它们
打印（'\n'.加入（ip））

问题在于

final=[base+links]

创建一个包含一个元素的列表，然后将该列表用于

final2=urllib.request.urlopen（final）

，您应该在其中传递字符串（url），而不是列表。是的。。我很笨，你说得对。。那我怎么能绕过这个？如果您可以回答，请将

final=[base+links]

替换为

final=base+links

。请注意，您只保留了从标记解析的最后一个最终代码，如果您需要所有这些代码，您应该更改其他代码。我是一个新手，我一直在尝试，但我仍然没有意识到我应该在“其他代码”中更改什么，正如您首先所说的。LOL问题是，

final=[base+links]

创建一个包含一个元素的列表，然后将该列表用于

final2=urllib.request.urlopen（final）

，您应该在其中传递字符串（url），而不是列表。是的。。我很笨，你说得对。。那我怎么能绕过这个？如果您可以回答，请将

final=[base+links]

替换为

final=base+links

。请注意，您只保留了从标记解析的最后一个最终代码，如果您需要所有这些代码，您应该更改其他代码。我是一个新手，我一直在尝试，但我仍然没有意识到我应该在“其他代码”中更改什么，正如您首先所说的。你是一个天使！你帮助了我这样的新手。我不知道该怎么感谢你。。。你是最好的最后一个问题，然后我做了。。如果我将所有捕获的IP保存在一个文件中，我该怎么做？我尝试过：out_file=

open（“proxy.txt”，“w”）out_file.write（ip）out_file.close（）

，但它只保存了一个ip。你需要在

append

模式下打开文件，或者覆盖所有内容，要做到这一点，请在打开文件时使用

“a”

，而不是

“w”

。如果你想学习，我的建议是看看这本免费的书；）再次感谢。不管怎样，我已经把html.it上的所有指南都写满了，但我还是有点卡住了。。但我在学习你是天使！你帮助了我这样的新手。我不知道该怎么感谢你。。。你是最好的最后一个问题，然后我做了。。如果我将所有捕获的IP保存在一个文件中，我该怎么做？我尝试过：out_file=

open（“proxy.txt”，“w”）out_file.write（ip）out_file.close（）

，但它只保存了一个ip。你需要在

append

模式下打开文件，或者覆盖所有内容，要做到这一点，请在打开文件时使用

“a”

，而不是

“w”

。如果你想学习，我的建议是看看这本免费的书；）再次感谢。不管怎样，我已经把html.it上的所有指南都写满了，但我还是有点卡住了。。但我在学习D