Python 查找文件中最长行前面的标签_Python_String

Python 查找文件中最长行前面的标签

python string

Python 查找文件中最长行前面的标签,python,string,Python,String,我有一个文件，格式如下 _line 1 this is a string on a line _line 2 this is another string _line 3 short line 我正在尝试编写一些Python代码，以获得它下面具有最长字符串长度的字符串的_linex标签。你能帮我修改代码吗？这是我到目前为止所拥有的 f = open('test.txt', 'r') print f read="null" top_read_line_length="0" topreadlin

我有一个文件，格式如下

_line 1
this is a string on a line
_line 2
this is another string
_line 3
short line

我正在尝试编写一些Python代码，以获得它下面具有最长字符串长度的字符串的_linex标签。你能帮我修改代码吗？这是我到目前为止所拥有的

f = open('test.txt', 'r')
print f

read="null"
top_read_line_length="0"
topreadline="null"
for line in f:
    checkifread=line.find('line')
    if checkifread==1:
        print "Read label found"
        #means we are on a read line
        currentread=line
    else:
        #We are on a sequence line for currentread.
        currentlength=len(line)
        print currentlength
    print top_read_line_length

    if int(top_read_line_length) < int(currentlength):
        print topreadline
        topreadline=currentread#now topreadline label is the "_line" string
        topreadlinelength=int(currentlength)
        print topreadline

        #go to next line

print "Done"
print "Longest line is...."
print topreadline

f=open（'test.txt'，'r'）
打印f
read=“null”
顶部\读取\行\长度=“0”
topreadline=“null”
对于f中的行：
checkifread=line.find（'line'）
如果checkifread==1：
打印“找到读取标签”
#意味着我们在读线上
当前读取=行
其他：
#我们在currentread的序列线上。
currentlength=len（线）
打印电流长度
打印顶部\读取\行\长度
如果int（顶部读取线长度）

如果您只需要文件中最长的一行（如问题标题所示），那么在现代Python中，这一行非常简单：

>>> max(open('test.txt'), key=len)

下面是一个

awk

程序：

BEGIN { best=""; best_length=0; current=""; }
/^_/ { current=$0; }
/^[^_]/ { if(length($0) > best_length) { best=current; best_length=length($0); }}
END { print "Longest line: "best" with length: "best_length }

（我更喜欢它，而不是下面的

python

version，它更接近地回答了您的问题……）

要获取最长行的标签，请构建标签到行长度的映射

在示例数据集中，它看起来像是带“_行”的标签，相应的行紧跟其后：

label2linelength = {}
for line in open('test.txt'):
    if line.startswith('_line '):
        label = line
    else:
        label2linelength[label] = len(line)
    lastline = line
print max(label2linelength.items(), key=lambda kv: kv[1])

我会这样做：

label = None
maxlen = 0
maxstr = ''
maxlabel = None
with open('f.txt') as f:
  for line in f:
    line = line.rstrip()
    if line.startswith('_line'):
      label = line
    elif len(line) > maxlen:
      maxlen = len(line)
      maxstr = line
      maxlabel = label
print maxlabel, maxstr

它比问题陈述更通用，因为它允许每个标签有多行文本。

这很容易实现：

data = open('test.txt').readlines()
max_line_pos = data.index(max(data, key=len))
prev_line = data[max_line_pos-1]
print prev_line

这相当短，即使在每个标签后面有多行文本也可以使用

content = list(open("test.txt"))
longest = content.index(max(content, key=len))
label = [ x for x in content[0:longest] if x.startswith("_line") ][-1]
print label.replace("_line ","")

还有另一种方法：

import re, mmap

with open("test.txt", "rb") as f:
    mm = mmap.mmap(f.fileno(), 0, mmap.MAP_PRIVATE, mmap.PROT_READ)
    print max(re.finditer(r'_line (\d+)\n(.*)', mm),
              key=lambda m: len(m.group(2))).group(1)

我会详细说明雷蒙德的答案；如果grouper（）在标准库中可用，那么这个答案将再次非常接近于一行；不幸的是，石斑鱼的定义仅限于

我想你更喜欢这个版本，因为它很实用。我没有测试它的性能，但至少我没有打开文件并搜索两次，也没有将全部内容保存在内存中

from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

max( grouper(2, open("test.txt")), key=lambda x:len(x[1]))[0]

这是我的。它适用于其他一些答案可能会失败的情况，例如输入文件，如

_line 1
abc
_line 2
defg
_line 3
hij

但它确实依赖于文件的格式，即您所说的格式

with open('test.txt') as f:
  spam = f.readlines()

labels = spam[0::2]
lines = spam[1::2]

d = dict(zip(labels, lines))

longest_lines_label = max(d, key=lambda x: len(d[x]))

print "Longest line is...."
print longest_lines_label, d[longest_lines_label]

如果您确定数据正确且不需要任何错误处理，则应执行以下操作：

lines = open('test.txt', 'r').readlines()
print max([(len(lines[i+1]), lines[i])
           for i in xrange(0, len(lines), 2)])[1].strip()

这是您的代码，已修复：

f = open('test.txt', 'r')
print f

read = None
top_read_line_length = 0
topreadline = None
currentlength = 0
label_line = True
for line in f:  
    if label_line:
        label_line = False
        print "label line", line
        #means we are on a read line
        currentread = line
    else:
        label_line = True
        #We are on a sequence line for currentread.
        currentlength = len(line)
        print 'cl', currentlength
    print top_read_line_length

    if top_read_line_length < currentlength:
        print 'trl', topreadline
        topreadline = currentread #now topreadline label is the "_line" string
        top_read_line_length = currentlength
        print 'trl', topreadline

        #go to next line

print "Done"
print "Longest line is...."
print topreadline

f=open（'test.txt'，'r'）
打印f
读取=无
顶部\读取\线\长度=0
topreadline=None
currentlength=0
标签线=真
对于f中的行：
如果标签为“U”行：
标签线=假
打印“标签行”，行
#意味着我们在读线上
当前读取=行
其他：
标签线=真
#我们在currentread的序列线上。
currentlength=len（线）
打印“cl”，当前长度
打印顶部\读取\行\长度
如果顶部读取线长度<当前长度：
打印“trl”，顶部阅读行
topreadline=currentread#现在topreadline标签是“_line”字符串
顶部读取线长度=当前长度
打印“trl”，顶部阅读行
#转到下一行
打印“完成”
打印“最长行为…”
打印顶读线

我添加了一个

label\u line

boolean来在标签行和数据行之间来回切换，但重要的部分是：

把足够的信息放在你的打印行上，看看发生了什么；及
与变量名保持一致

问题出现在上一个

if

套件中——您正在检查

顶部读取线长度

，但正在设置

顶部读取线长度

（无下划线）。

另一个简明变体：

from itertools import imap, izip
from operator import itemgetter
with open("a.py") as f:
    res = max(izip(f, imap(len, f)), key=itemgetter(1))[0]

这将每隔一行视为一个标签。

这是您的实际代码吗？因为您有一些不一致的变量名，而行

checkifread=line.find（'line'）

将查找任何包含字符串

'line'

的行，包括示例输入中的第二行。@brc:他会进行检查，以确保

line

从位置1开始，wim和ignas的解决方案符合您的需求，而我的解决方案解决了代码中的错误。别忘了接受一条。这只会找到最长一行的长度，而不是问题所要查找的前一行的“标签”。@wim看看他想要什么。他想让这条线走得最长。@wim：是的，我说错了（我猜是写错了）。但关键是它只得到最长的一行，而OP希望在最长的一行之前找到标签。为此，您需要更像

max（zip（*[open（filename）]*2），key=lambda x:len（x[1]））[0]

，尽管一个可爱的单行代码并不能真正帮助Brian理解为什么他的代码不起作用。实际问题的答案如下所示。这是主题标题中所述问题的答案，这个答案对肖恩作品以外的读者更有价值，这是正确的。CPython使用引用计数，并在max（）完成后立即关闭文件。在PyPy中，垃圾收集器在其选择的时间或脚本结束时执行工作。在本例中，脚本仅使用一个文件，然后在打印其结果时结束。所有版本的Python在退出之前都会关闭该文件。第一个版本会在主题标题中回答这个问题（查找文件中最长行的长度）。这是一个具有某种魔力的通用答案。第二个答案解决了请求正文中所述的更普通的家庭作业式问题。这一个不太可能是通用的，所以有两个答案。我在stackoverflow上只呆了几天，现在仍在想办法。不幸的是，我认为这个答案不适用于wher的情况

from itertools import imap, izip
from operator import itemgetter
with open("a.py") as f:
    res = max(izip(f, imap(len, f)), key=itemgetter(1))[0]