在Python中遍历文件的字_Python_File_Io

在Python中遍历文件的字

python file io

在Python中遍历文件的字,python,file,io,Python,File,Io,我需要遍历一个大文件中的单词，这个文件由一行很长的代码组成。我知道有一些方法逐行遍历文件，但是它们不适用于我的情况，因为它是单行结构有其他选择吗？像平常一样读入这行，然后在空白处将其拆分成单词比如： word_list = loaded_string.split() line = '' while True: word, space, line = line.partition(' ') if space: # A word was found

我需要遍历一个大文件中的单词，这个文件由一行很长的代码组成。我知道有一些方法逐行遍历文件，但是它们不适用于我的情况，因为它是单行结构

有其他选择吗？

像平常一样读入这行，然后在空白处将其拆分成单词

比如：

word_list = loaded_string.split()

line = ''
while True:
    word, space, line = line.partition(' ')
    if space:
        # A word was found
        yield word
    else:
        # A word was not found; read a chunk of data from file
        next_chunk = input_file.read(1000)
        if next_chunk:
            # Add the chunk to our line
            line = word + next_chunk
        else:
            # No more data; yield the last word and return
            yield word.rstrip('\n')
            return

有更有效的方法可以做到这一点，但从语法上讲，这可能是最短的：

 words = open('myfile').read().split()

如果内存是一个问题，你不会想这样做，因为它会将整个内容加载到内存中，而不是对其进行迭代。

这实际上取决于你对单词的定义。但是试试这个：

f = file("your-filename-here").read()
for word in f.split():
    # do something with word
    print word

这将使用空白字符作为单词边界

当然，请记住正确打开和关闭文件，这只是一个简单的示例。

阅读这一行后，您可以执行以下操作：

l = len(pattern)
i = 0
while True:
    i = str.find(pattern, i)
    if i == -1:
        break
    print str[i:i+l] # or do whatever
    i += l

亚历克斯。

长队？我假设行太大，无法合理地放入内存，因此需要某种缓冲

首先，这是一种糟糕的格式；如果您对该文件有任何类型的控制，请使其每行一个字

如果没有，请使用以下方法：

word_list = loaded_string.split()

line = ''
while True:
    word, space, line = line.partition(' ')
    if space:
        # A word was found
        yield word
    else:
        # A word was not found; read a chunk of data from file
        next_chunk = input_file.read(1000)
        if next_chunk:
            # Add the chunk to our line
            line = word + next_chunk
        else:
            # No more data; yield the last word and return
            yield word.rstrip('\n')
            return

你真的应该考虑使用< /P>

唐纳德·米纳的建议看起来不错。简单而简短。我在不久前编写的代码中使用了以下内容：

l = []
f = open("filename.txt", "rU")
for line in f:
    for word in line.split()
        l.append(word)

Donald Miner建议的更长版本。

我已经回答了一个类似的问题，但我改进了回答中使用的方法，下面是更新版本（从最近的一个版本复制）：

这是我完全实用的方法，它避免了阅读和阅读分割线。它利用模块：

注意：对于python 3，将

itertools.imap

替换为

map

示例用法：

>>> import sys
>>> for w in readwords(sys.stdin):
...     print (w)
... 
I really love this new method of reading words in python
I
really
love
this
new
method
of
reading
words
in
python
           
It's soo very Functional!
It's
soo
very
Functional!
>>>

我猜在您的情况下，这将是使用函数的方式：

with open('words.txt', 'r') as f:
    for word in readwords(f):
        print(word)

用缓冲区读取少量文件

my_file.read（200）

您应该记住，当您希望在文件中每行写入一个单词时，此选项可以正常工作，但如果您只想使用它，以便只生成一个单词，则此选项不起作用。当块中有

dog\ncat

时，此选项不起作用。它产生

dog\ncat

，而不是

dog

，然后是

cat

。当打印

dog\ncat

时，它看起来不错，但这是虚幻的。