Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/macos/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在不更改我的代码的情况下,恢复错误似乎是随机的。可能的内存错误?_Python_Csv_Pandas_Memory_Keyerror - Fatal编程技术网

Python 在不更改我的代码的情况下,恢复错误似乎是随机的。可能的内存错误?

Python 在不更改我的代码的情况下,恢复错误似乎是随机的。可能的内存错误?,python,csv,pandas,memory,keyerror,Python,Csv,Pandas,Memory,Keyerror,我正在阅读一本sklearn教程,其中有以下部分: Next, we create a TfidfVectorizer. Recall from Chapter 3, Feature Extraction and Preprocessing, that TfidfVectorizer combines CountVectorizer and TfidfTransformer. We fit it with the training messages, and transform both the

我正在阅读一本sklearn教程,其中有以下部分:

Next, we create a TfidfVectorizer. Recall from Chapter 3, Feature Extraction and Preprocessing, that TfidfVectorizer combines CountVectorizer and TfidfTransformer. We fit it with the training messages, and transform both the training and test messages:

>>> vectorizer = TfidfVectorizer()
>>> X_train = vectorizer.fit_transform(X_train_raw)
>>> X_test = vectorizer.transform(X_test_raw)
Finally, we create an instance of LogisticRegression and train our model. Like LinearRegression, LogisticRegression implements the fit() and predict() methods. As a sanity check, we printed a few predictions for manual inspection:

>>> classifier = LogisticRegression()
>>> classifier.fit(X_train, y_train)
>>> predictions = classifier.predict(X_test)
>>> for i, prediction in enumerate(predictions[:5]):
>>>     print 'Prediction: %s. Message: %s' % (prediction, X_test_raw[i])
The following is the output of the script:

Prediction: ham. Message: If you don't respond imma assume you're still asleep and imma start calling n shit
Prediction: spam. Message: HOT LIVE FANTASIES call now 08707500020 Just 20p per min NTT Ltd, PO Box 1327 Croydon CR9 5WB 0870 is a national rate call
Prediction: ham. Message: Yup... I havent been there before... You want to go for the yoga? I can call up to book 
Prediction: ham. Message: Hi, can i please get a  <#>  dollar loan from you. I.ll pay you back by mid february. Pls.
Prediction: ham. Message: Where do you need to go to get it?
我接着说:

ddir = (sys.argv[1])


df = pd.read_csv(ddir + '/SMSSpamCollection', delimiter='\t', header=None)

#print df.head
#print 'Number of spam messages: ', df[df[0] == 'spam'][0].count()
#print 'Number of ham messages: ', df[df[0] == 'ham'][0].count()

X_train_raw, X_test_raw, y_train, y_test = train_test_split(df[1], df[0])

vectorizer = TfidfVectorizer()

X_train = vectorizer.fit_transform(X_train_raw)
X_test = vectorizer.transform(X_test_raw)


classifier = LogisticRegression()
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)

for i, pdn in enumerate(predictions):
    print 'Prediction: %s. Message: %s' % (pdn, X_test_raw[i])
然而,由于某种原因,这给了我一个错误。认为这是我的修改,我重写了我的代码,按照书中的一行接一行:

for i, prediction in enumerate(predictions[:5]):
    print 'Prediction: %s. Message: %s' % (prediction, X_test_raw[i])
然而,在崩溃之前,这只打印了两个答案:

Number of spam messages: 747
Number of ham messages: 4825
['ham' 'ham' 'ham' ..., 'ham' 'ham' 'ham']
Prediction: ham. Message: Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
Prediction: ham. Message: Ok lar... Joking wif u oni...
Traceback (most recent call last):
  File "Chapter4[B-FLGTLG][Y-SF]--[DC].py", line 38, in <module>
    print 'Prediction: %s. Message: %s' % (prediction, X_test_raw[i])
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 583, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1980, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3332)
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:3035)
  File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
  File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6610)
  File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6554)
KeyError: 2
垃圾邮件数量:747
ham消息数:4825
[“火腿”“火腿”“火腿”…,“火腿”“火腿”“火腿”]
预测:火腿。留言:一直走到句容点,疯了。。仅在bugis n great world la e自助餐厅提供。。。电影里有阿莫尔·沃特。。。
预测:火腿。信息:好的,我。。。和你开玩笑。。。
回溯(最近一次呼叫最后一次):
文件“第4章[B-FLGTLG][Y-SF]-[DC].py”,第38行,中
打印的预测:%s。消息:%s'(预测,X\u测试\u原始[i])
文件“/usr/local/lib/python2.7/dist packages/pandas/core/series.py”,第583行,在__
结果=self.index.get_值(self,key)
文件“/usr/local/lib/python2.7/dist packages/pandas/index/base.py”,第1980行,在get_值中
tz=getattr(series.dtype,'tz',无))
pandas.index.IndexEngine.get_值(pandas/index.c:3332)中第103行的文件“pandas/index.pyx”
pandas.index.IndexEngine.get_值(pandas/index.c:3035)中第111行的文件“pandas/index.pyx”
pandas.index.IndexEngine.get_loc(pandas/index.c:4018)中的文件“pandas/index.pyx”,第159行
pandas.hashtable.Int64HashTable.get_项(pandas/hashtable.c:6610)中第303行的文件“pandas/hashtable.pyx”
pandas.hashtable.Int64HashTable.get_项(pandas/hashtable.c:6554)中第309行的文件“pandas/hashtable.pyx”
关键错误:2
现在问题来了:我第二次运行了完全相同的脚本,没有更改任何一行,它给了我一个不同的错误:

垃圾邮件数量:747
ham消息数:4825
[“火腿”“火腿”“火腿”…,“火腿”“火腿”“火腿”]
回溯(最近一次呼叫最后一次):
文件“第4章[B-FLGTLG][Y-SF]-[DC].py”,第38行,中
打印的预测:%s。消息:%s'(预测,X\u测试\u原始[i])
文件“/usr/local/lib/python2.7/dist packages/pandas/core/series.py”,第583行,在__
结果=self.index.get_值(self,key)
文件“/usr/local/lib/python2.7/dist packages/pandas/index/base.py”,第1980行,在get_值中
tz=getattr(series.dtype,'tz',无))
pandas.index.IndexEngine.get_值(pandas/index.c:3332)中第103行的文件“pandas/index.pyx”
pandas.index.IndexEngine.get_值(pandas/index.c:3035)中第111行的文件“pandas/index.pyx”
pandas.index.IndexEngine.get_loc(pandas/index.c:4018)中的文件“pandas/index.pyx”,第159行
pandas.hashtable.Int64HashTable.get_项(pandas/hashtable.c:6610)中第303行的文件“pandas/hashtable.pyx”
pandas.hashtable.Int64HashTable.get_项(pandas/hashtable.c:6554)中第309行的文件“pandas/hashtable.pyx”
关键错误:0
这怎么可能呢?我是否有一些低级的ram bug在破坏python本身?数据在这里:以防有人想跟进。

更新

我从这里找到了一个解决方案:

我改变了:

print 'Prediction: %s. Message: %s' % (pdn, X_test_raw[i])

现在它运行良好:

Number of spam messages: 747
Number of ham messages: 4825
['ham' 'spam' 'ham' ..., 'spam' 'ham' 'ham']
Prediction: ham. Message: Well done, blimey, exercise, yeah, i kinda remember wot that is, hmm. 
Prediction: spam. Message: U have won a nokia 6230 plus a free digital camera. This is what u get when u win our FREE auction. To take part send NOKIA to 83383 now. POBOX114/14TCR/W1 16
Prediction: ham. Message: I doubt you could handle 5 times per night in any case...
Prediction: ham. Message: I've told you everything will stop. Just dont let her get dehydrated.
Prediction: ham. Message: AH POOR BABY!HOPE URFEELING BETTERSN LUV! PROBTHAT OVERDOSE OF WORK HEY GO CAREFUL SPK 2 U SN LOTS OF LOVEJEN XXX.
更新

我从这里找到了一个解决方案:

我改变了:

print 'Prediction: %s. Message: %s' % (pdn, X_test_raw[i])

现在它运行良好:

Number of spam messages: 747
Number of ham messages: 4825
['ham' 'spam' 'ham' ..., 'spam' 'ham' 'ham']
Prediction: ham. Message: Well done, blimey, exercise, yeah, i kinda remember wot that is, hmm. 
Prediction: spam. Message: U have won a nokia 6230 plus a free digital camera. This is what u get when u win our FREE auction. To take part send NOKIA to 83383 now. POBOX114/14TCR/W1 16
Prediction: ham. Message: I doubt you could handle 5 times per night in any case...
Prediction: ham. Message: I've told you everything will stop. Just dont let her get dehydrated.
Prediction: ham. Message: AH POOR BABY!HOPE URFEELING BETTERSN LUV! PROBTHAT OVERDOSE OF WORK HEY GO CAREFUL SPK 2 U SN LOTS OF LOVEJEN XXX.