Python load（）错误的魔术字符串错误_Python_Numpy

Python load（）错误的魔术字符串错误

python numpy

Python load（）错误的魔术字符串错误,python,numpy,Python,Numpy,我有两个文件。可以创建压缩稀疏行格式的numpy数组 from sklearn.feature_extraction.text import TfidfTransformer import pdb def stem_document(document): translatedict = "" stemmer = PorterStemmer() for word in string.punctuation: translatedict = translat

我有两个文件。可以创建压缩稀疏行格式的

numpy

数组

from sklearn.feature_extraction.text import TfidfTransformer
import pdb

def stem_document(document):
    translatedict = ""
    stemmer = PorterStemmer()
    for word in string.punctuation:
        translatedict = translatedict + word
    doc_stemmed = []
    for word in document.split():
        lowerstrippedword = ''.join(c for c in word.lower() if c not in translatedict)
        try: 
            stemmed_word = stemmer.stem(lowerstrippedword)
            doc_stemmed.append(stemmed_word)
        except:
            print lowerstrippedword + " could not be stemmed."
    return ' '.join(doc_stemmed)

def readFileandStem(filestring):
    with open(filestring, 'r') as file:
        reader = csv.reader(file)
        file_extras = []
        vector_data = []        
        error = False
        while (error == False):
            try:
                next = reader.next()
                if len(next) == 3 and next[2] != "":
                    document = next[2]
                    stemmed_document = stem_document(document)
                    vector_data.append(stemmed_document)
                    file_extra = []
                    file_extra.append(next[0])
                    file_extra.append(next[1])
                    file_extras.append(file_extra)
            except:
                error = True
    return [vector_data, file_extras]

filestring = 'Data.csv'
print "Reading File"
data = readFileandStem(filestring)
documents = data[0]
file_extras = data[1]
print "Vectorizing Data"
vectorizer = CountVectorizer()
matrix = vectorizer.fit_transform(documents)
tf_idf_transform = TfidfTransformer(use_idf=False).fit(matrix)
tf_idf_matrix = tf_idf_transform.transform(matrix)
with open('matrix/matrix.npy', 'w') as matrix_file:
    np.save(matrix_file, tf_idf_matrix)
file_json_map = {}
file_json_map['extras'] = file_extras
with open('matrix/extras.json', 'w') as extras_file:
    extras_file.write(json.dumps(file_json_map))
print "finished"

下一个文件应该加载相同的文件

import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
import json
import pdb

with open('matrix/matrix.npy', 'r') as matrix_file:
    matrix = np.load(matrix_file)

hcluster = linkage(matrix, "complete")

但是，我得到以下错误：

File "Cluster.py", line 7, in <module>
    matrix = np.load(matrix_file)
  File "C:\Users\jarek\Anaconda2\lib\site-packages\numpy\lib\npyio.py", line 406, in load
    pickle_kwargs=pickle_kwargs)
  File "C:\Users\jarek\Anaconda2\lib\site-packages\numpy\lib\format.py", line 620, in read_array
    version = read_magic(fp)
  File "C:\Users\jarek\Anaconda2\lib\site-packages\numpy\lib\format.py", line 216, in read_magic
    raise ValueError(msg % (MAGIC_PREFIX, magic_str[:-2]))
ValueError: the magic string is not correct; expected '\x93NUMPY', got '\x00\x00I\x1c\x00\x00'

文件“Cluster.py”，第7行，在
矩阵=np.load（矩阵文件）
文件“C:\Users\jarek\Anaconda2\lib\site packages\numpy\lib\npyio.py”，第406行，已加载
pickle_-kwargs=pickle_-kwargs）
文件“C:\Users\jarek\Anaconda2\lib\site packages\numpy\lib\format.py”，第620行，以只读数组形式
版本=读取魔法（fp）
文件“C:\Users\jarek\Anaconda2\lib\site packages\numpy\lib\format.py”，第216行，以只读形式
raise VALUERROR（消息%（MAGIC_前缀，MAGIC_str[：-2]））
ValueError：魔术字符串不正确；预期为'\x93NUMPY'，获得'\x00\x00I\x1c\x00\x00'

我不知道为什么魔术字符串会不正确，因为根据我所了解的，所有.npy文件都应该有相同的魔术字符串“\x93NUMPY”

想法？

我以前也遇到过类似的问题

改变

open('matrix/matrix.npy', 'w')
...
open('matrix/matrix.npy', 'r')

到

解决了我的问题。

不使用

打开（blahblah）作为矩阵_文件

。只要试一下

np.load（blahblah）

这个解决方案就没什么好运气了。尝试：“matrix=np.load（'matrix/matrix.npy'）”

open('matrix/matrix.npy', 'wb')
...
open('matrix/matrix.npy', 'rb')