Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在4GB文件上运行python脚本时出现内存错误_Python_Python 2.7 - Fatal编程技术网

在4GB文件上运行python脚本时出现内存错误

在4GB文件上运行python脚本时出现内存错误,python,python-2.7,Python,Python 2.7,我试图计算长度在1到5之间的单词数,文件大小约为4GB,我得到内存错误 import os files = os.listdir('C:/Users/rram/Desktop/') for file_name in files: file_path = "C:/Users/rram/Desktop/"+file_name f = open (file_path, 'r') text = f.readlines() update_te

我试图计算长度在1到5之间的单词数,文件大小约为4GB,我得到内存错误

import os 
files = os.listdir('C:/Users/rram/Desktop/') 
for file_name in files:     
    file_path = "C:/Users/rram/Desktop/"+file_name     
    f = open (file_path, 'r')    
    text = f.readlines()
    update_text = '' 
    wordcount = {}
    for line in text:         
        arr = line.split("|")
        word = arr[13]
        if 1<=len(word)<6:
            if word not in wordcount:
                wordcount[word] = 1
        else:
            wordcount[word] += 1
            update_text+= '|'.join(arr)
print (wordcount)     #print update_text
print 'closing', file_path, '\t', 'total files' , '\n\n'
f.close()
导入操作系统
files=os.listdir('C:/Users/rram/Desktop/'))
对于文件中的文件名:
file\u path=“C:/Users/rram/Desktop/”+文件名
f=打开(文件路径“r”)
text=f.readlines()
更新_text=“”
字数={}
对于文本中的行:
arr=直线分割(“|”)法
word=arr[13]

如果1如评论中所建议,您应该逐行阅读文件,而不是整个文件

例如:

count = 0
with open('words.txt','r') as f:
    for line in f:
        for word in line.split():
          if(1 <= len(word) <=5):
              count=count+1
print(count)

由于该行可能包含少于14个单词,这可能会导致分段错误。

删除该行
text=f.readlines()
您可以在文件句柄上迭代,请更正缩进?您应该像
那样迭代f:
中的行。不要在一次读取所有文件时使内存过载。很抱歉,复制粘贴时缩进已移出@MohamedALANIcan我使用f.readline(),因为它加载到内存中并执行操作,所以可以更快地输出。在我的文件中有许多记录,每个记录字段之间用|分隔,我特别关注列号14。@FlorentJousse:如果你担心,
arr
没有足够的元素,使用
count+=len(arr)>=13和1亲爱的@FlorentJousse,我确信我的记录长度超过14,大约有70列,如果没有数据,只有管道可用。非常感谢你。他们的迭代器int还有其他优势吗?
count = 0
with open('words.txt','r') as f:
    for line in f:
        iterator = 0
        for word in line.split("|"):
            if(1 <= len(word) <=5 and iterator == 13):
                count=count+1
            iterator = iterator +1
print(count)
arr = line.split("|")
word = arr[13]