Python 使用urllib以特定的结构化格式将获取的数据保存到文件中_Python_File_Python 3.x_Urllib

Python 使用urllib以特定的结构化格式将获取的数据保存到文件中

python file python-3.x

Python 使用urllib以特定的结构化格式将获取的数据保存到文件中,python,file,python-3.x,urllib,Python,File,Python 3.x,Urllib,我想知道是否有一种方法可以将html以特定的结构化格式保存在我的文件中。现在，这个脚本的输出只是一堆字母和数字。有没有一种方法可以将其结构化？例如：111.111.111.11:111 222.222.222.22:22（IP格式）感谢您的帮助 import urllib.request import re ans = True while ans: print(""" - Menu Selection - 1. Automatic 2. Au

我想知道是否有一种方法可以将html以特定的结构化格式保存在我的文件中。现在，这个脚本的输出只是一堆字母和数字。有没有一种方法可以将其结构化？例如：111.111.111.11:111 222.222.222.22:22（IP格式）

感谢您的帮助

import urllib.request
import re

ans = True

while ans:
    print("""
      - Menu Selection -
      1. Automatic 
      2. Automatic w/Checker
      3. Manual
      4. Add to list
      5. Exit
      """)
ans = input('Select Option : ')

if ans =="1":
    try :
       with urllib.request.urlopen('http://www.mywebsite.net') as response: 
         html = response.read()
         html = str(html)
         html = re.sub(r'([a-z][A-Z])', '', html)
         f = open('text.txt','a')
         f.write(html)
         f.close()
         print('Data(1) saved.')
         ans = True
    except :
        print('Error on first fetch.')

根据问题-

如果样本输入为-

输入-fdsfdsfd123.123.123.123:123fdds125.125.125:125fdsfdfdsfdsfsdf

输出-123.123.123:123（换行）125.125.125.125:125

如果

html

是输入字符串-

filtered_alpha = re.sub('[^0-9\.:]','\n', html)
multiple_ips = filter(None, filtered_alpha.split("\n"))
print "\n".join(multiple_ips)

这将为您提供预期的输出

如果您专门寻找ip_地址，您可以参考@MarkByers的帖子，其中他提到-

ip=re.findall（r'[0-9]+（？：\[0-9]+）{3}'，html）

使用html解析器，如

BeautifulSoup

。您希望点号和冒号的顺序如何。。你有没有办法让它保持完整？以IP的形式。@dexray你能给出一个输入和输出示例吗。详细的例子。UnclarInput中的预期结果=包含以下内容的文本文件：fdsfdsfd123.123.123.123:123fdds125.125.125:125fdsfdsfdsfsdf我希望我的输出为=123.123.123.123:123（newline）125.125.125:125非常感谢您的朋友@dexray-没问题：）还有一个小问题。有没有办法用空格或换行符分隔每个IP？如果所有这些IP都在列表中，您可以执行“.join（list\u name）或“\n”。join（list\u name）在IP之间有空格或新行，就像我在示例中所做的那样

print”\n。join（多个IP）

这最终没有解决我的问题。我使用了一个网站，其中html只是IP和端口。对于其他网站，我尝试的所有内容都不会以IP:PORT的格式返回。