Python 2.7 如何在python中提取字符串并将其作为文本文件中的多行写入?
这里是Python新手 我试图从log.txt文件中获取最活跃的ip地址,并将其打印到另一个文本文件中。我的第一步是获取所有ip地址。其次,对最常出现的ip地址进行排序。但我陷入了第一步,那就是:Python 2.7 如何在python中提取字符串并将其作为文本文件中的多行写入?,python-2.7,ip-address,Python 2.7,Ip Address,这里是Python新手 我试图从log.txt文件中获取最活跃的ip地址,并将其打印到另一个文本文件中。我的第一步是获取所有ip地址。其次,对最常出现的ip地址进行排序。但我陷入了第一步,那就是: with open('./log_input/log.txt', 'r+') as f: # loops the lines in teh text file for line in f: # split line at whitespace cols
with open('./log_input/log.txt', 'r+') as f:
# loops the lines in teh text file
for line in f:
# split line at whitespace
cols = line.split()
# get last column
byte_size = cols[-1]
# get the first column [0]
ip_addresses = cols[0]
# remove brackets
byte_size = byte_size.strip('[]')
# write the byte size in the resource file
resource_file = open('./log_output/resources.txt', 'a')
resource_file.write(byte_size + '\n')
resource_file.truncate()
# write the ip addresses in the host file
host_file = open('./log_output/hosts.txt', 'a')
host_file.seek(0)
host_file.write(ip_addresses + '\n')
host_file.truncate()
resource_file.close()
host_file.close()
问题是在新的host.txt文件中,它会重新打印ip地址,而不是覆盖。我也试过:
resource_file = open('./log_output/resources.txt', 'w')
host_file = open('./log_output/hosts.txt', 'w')
和“w+”
等等。。但是w
或w+
在主机文件中只提供一个ip地址
有人能给我指点迷津吗
示例输入文件
collections.Counter
是一个方便的计数工具。向它输入一组文本字符串,它将创建一个dict
将文本映射到该文本的显示次数。现在计算IP地址很容易
>>> import collections
>>> with open('log.txt') as fp:
... counter = collections.Counter(line.split(' ', 1)[0].lower() for line in fp)
...
>>> counter
Counter({'isdn6-34.dnai.com': 2, 'ix-ftw-tx1-24.ix.netcom.com': 1, 'www-c2.proxy.aol.com': 1})
>>> counter.most_common(1)
[('isdn6-34.dnai.com', 2)]
>>>
>>>
>>> with open('most_common.txt', 'w') as fp:
... fp.write(counter.most_common(1)[0][0])
...
17
>>> open('most_common.txt').read()
'isdn6-34.dnai.com'
谢谢你的帮助和建议。。这解决了我的问题
with open('./log_input/log.txt', 'r+') as f:
# loops the lines in teh text file
new_ip_addresses = ""
new_byte_sizes = ""
new_time_stamp = ""
resource_file = open('./log_output/resources.txt', 'w')
host_file = open('./log_output/hosts.txt', 'w')
hours_file = open('./log_output/hours.txt', 'w')
for line in f:
# print re.findall("\[(.*?)\]", line) # ['Hi all', 'this is', 'an example']
# split line at whitespace
cols = line.split(' ')
#get the time stamp times
# print(cols[4])
# get byte sizes from the
byte_size = cols[-1]
new_byte_sizes += byte_size
# get ip/host
ip_addresses = cols[0]
new_ip_addresses += ip_addresses + '\n'
# remove brackets
byte_size = byte_size.strip('[]')
# write the byte size in the resource file
print(new_byte_sizes)
resource_file.write(new_byte_sizes)
resource_file.close()
# write the ip addresses in the host file
print(new_ip_addresses)
host_file.write(new_ip_addresses)
host_file.close()
# write the ip addresses in the host file
print(new_ip_addresses)
host_file.write(new_ip_addresses)
host_file.close()
基本上,将值赋给for循环内的变量,并添加新行,我就解决了这个问题
new\u ip\u addresses+=ip\u addresses+'\n'
我首先建议只打开一次资源文件:资源文件=打开('./log\u output/resources.txt',a')应该在启动for循环之前打开。主机_文件也是一样。你能发布一些输入文件的示例行以便我们进行测试吗?它会重新打印ip地址,而不是覆盖。。。我不知道那是什么意思。你想在那个文件里写什么?所有地址都有重复项,所有地址都没有重复项?一个问题是您写入并截断了文件,但没有关闭文件。因此,下一个host\u file=open('./log\u output/hosts.txt',a')
打开一个过时的文件版本,然后当它重新分配host\u文件时,上一个循环的数据被刷新到该文件中。使用后关闭该设备或将其放入子句中。www-c2.proxy.aol.com---[01/Jul/1995:00:03:52-0400]“GET/history/skylab/skylab-1.html HTTP/1.0”200 1659 isdn6-34.dnai.com---[01/Jul/1995:00:03:52-0400]“GET/images/kscmap-tiny.gif HTTP/1.0”200 2537 isdn6-34.dnai.com---[01/Jul/1995:00:03-0400]“GET/images/ksclogosmall.gif HTTP/1.0”200 3635 ix-ftw-tx1-24.ix.netcom.com--[01/Jul/1995:00:03:52-0400]“GET/shutter/countdown/count.gif HTTP/1.0”200 40310
with open('./log_input/log.txt', 'r+') as f:
# loops the lines in teh text file
new_ip_addresses = ""
new_byte_sizes = ""
new_time_stamp = ""
resource_file = open('./log_output/resources.txt', 'w')
host_file = open('./log_output/hosts.txt', 'w')
hours_file = open('./log_output/hours.txt', 'w')
for line in f:
# print re.findall("\[(.*?)\]", line) # ['Hi all', 'this is', 'an example']
# split line at whitespace
cols = line.split(' ')
#get the time stamp times
# print(cols[4])
# get byte sizes from the
byte_size = cols[-1]
new_byte_sizes += byte_size
# get ip/host
ip_addresses = cols[0]
new_ip_addresses += ip_addresses + '\n'
# remove brackets
byte_size = byte_size.strip('[]')
# write the byte size in the resource file
print(new_byte_sizes)
resource_file.write(new_byte_sizes)
resource_file.close()
# write the ip addresses in the host file
print(new_ip_addresses)
host_file.write(new_ip_addresses)
host_file.close()
# write the ip addresses in the host file
print(new_ip_addresses)
host_file.write(new_ip_addresses)
host_file.close()