Python 3.4.1中的内存错误_Python_Python 3.x

Python 3.4.1中的内存错误

python python-3.x

Python 3.4.1中的内存错误,python,python-3.x,Python,Python 3.x,以下是我的代码，它给了我内存错误： with open('E:\\Book\\1900.txt', 'r', encoding='utf-8') as readFile: for line in readFile: sepFile = readFile.read().lower() words_1900 = re.findall('\w+', sepFile) 输出： Traceback (most recent call last): File "C:

以下是我的代码，它给了我内存错误：

with open('E:\\Book\\1900.txt', 'r', encoding='utf-8') as readFile:
    for line in readFile:
        sepFile = readFile.read().lower()
        words_1900 = re.findall('\w+', sepFile)

输出：

Traceback (most recent call last):
File "C:\Python34\50CommonWords.py", line 13, in <module>
sepFile = readFile.read().lower()
MemoryError

我会说，与其将整个文件读入内存，不如逐行读取文件，然后使用以增量方式跟踪整个文件中的单词及其计数。然后在最后使用该方法得到50个最常见的元素。范例-

import collections
import re
cnt = Counter()
with open('E:\\Book\\1900.txt', 'r', encoding='utf-8') as readFile:
    for line in readFile:
        cnt.update(re.findall('\w+', line.lower()))
print("50 most common are")
print([x for x,countx in cnt.most_common(50)])       # Doing this list comprehension to only take the elements, not the count.

如果文件中有许多不同的单词，此方法也可能以MemoryError结束

此外，Counter.most_common返回一个元组列表，在每个元组中，元组的第一个元素是实际单词，第二个元素是该单词的计数。

显然E:\\Book\\1900.txt是一个非常大的文件，您想做什么？为什么您要先遍历它的行，然后调用.read？您正在为每行将整个文件读入内存。为什么不使用你已有的那一行呢？可以。在循环之外阅读是的，这是一个362MB的大文件。你能编辑我的代码吗？编辑你的代码吗？你还没有告诉我们你到底想用那个大文件做什么。