Python-如何读取文本文件中的特定行？_Python_Text_Readline

Python-如何读取文本文件中的特定行？

python text

Python-如何读取文本文件中的特定行？,python,text,readline,Python,Text,Readline,我有一个巨大的文本文件（12GB）。行由制表符分隔，第一列包含一个ID。对于每个ID，我想做些什么。因此，我的计划是从第一行开始，逐行通过第一列，直到到达下一个ID start_line = b num_lines = 377763316 while b < num_lines: plasmid1 = linecache.getline("Result.txt", b-1) plasmid1 = plasmid1.strip("\n") plasmid1 = plasmid1

我有一个巨大的文本文件（12GB）。行由制表符分隔，第一列包含一个ID。对于每个ID，我想做些什么。因此，我的计划是从第一行开始，逐行通过第一列，直到到达下一个ID

start_line = b
num_lines = 377763316

while b < num_lines:
  plasmid1 = linecache.getline("Result.txt", b-1)
  plasmid1 = plasmid1.strip("\n")
  plasmid1 = plasmid1.split("\t")

  plasmid2 = linecache.getline("Result.txt", b)
  plasmid2 = plasmid2.strip("\n")
  plasmid2 = plasmid2.split("\t")


    if not str(plasmid1[0]) == str(plasmid2[0]):
      end_line = b
      #do something

start\u line=b
行数=377763316
当b


代码可以工作，但问题是linecache似乎每次都会重新加载txt文件。如果我不提高性能，代码将运行数年
如果您有解决问题的好主意或知道其他方法，我感谢您的帮助
谢谢，
Philipp
您应该只打开文件一次，然后在这些行上迭代
with open('Result.txt', 'r') as f:
    aline = f.next()
    currentid = aline.split('\t', 1)[0]
    for nextline in f:
        nextid = nextline.split('\t', 1)[0]
        if nextid != currentid:
            #do stuff
            currentid = nextid

你明白了，就用普通的python吧。
每次迭代只读取一行。拆分中的额外1
参数将只拆分到第一个选项卡，从而提高性能。使用任何专用库都不会获得更好的性能。只有简单的C语言实现才能击败这种方法
如果您得到的是AttributeError:“\u io.TextIOWrapper”对象具有
，可能是因为您使用的是Python3.X（请参见问题）。请尝试此版本：
with open('Result.txt', 'r') as f:
    aline = f.readline()
    currentid = aline.split('\t', 1)[0]
    while aline != '':
        aline = f.readline()
        nextid = aline.split('\t', 1)[0]
        if nextid != currentid:
            #do stuff
            currentid = nextid

我认为这是一条路要走。另外，最好传递usecols
参数来指定实际需要从文件中获取哪些列。Numpy包是一个以高性能为目标编写的可靠库
调用loadtxt（）
后，您将返回。
您可以使用itertools：
from itertools import takewhile

class EqualityChecker(object):
   def __init__(self, id):
       self.id = id

   def __call__(self, current_line):
       result = False
       current_id = current_line.split('\t')[0]

       if self.id == current_id:
           result = True

       return result


with open('hugefile.txt', 'r') as f:
   for id in ids:
       checker = EqualityChecker(id)
       for line in takewhile(checker, f.xreadlines()):
           do_stuff(line) 

在外部循环中，id
实际上可以从id与前面的值不匹配的第一行获得。
行是以制表符分隔的吗？听起来像是列？请显示所有代码。什么是linecache
@eguaio:linecache
不是为此而设计的。源代码：“缓存Python源文件中的行”。是的，通过查看源代码，linecache每次都会重新打开文件。谢谢你的评论！我收到以下错误：AttributeError:“\u io.TextIOWrapper”对象没有属性“next”有什么想法吗？这是python 2与3的不兼容。