Python+;熊猫:用列表填充新的数据框
我已经创建了一个名为category_的列表,使用以下方法进行预测:Python+;熊猫:用列表填充新的数据框,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我已经创建了一个名为category_的列表,使用以下方法进行预测: def prediction(optimal_alpha, metric): category_predicted = [] multinomial_naive_bayes_optimal = MultinomialNB(alpha=optimal_alpha) # fitting the model multinomial_naive_bayes_optimal.fit(x_train_co
def prediction(optimal_alpha, metric):
category_predicted = []
multinomial_naive_bayes_optimal = MultinomialNB(alpha=optimal_alpha)
# fitting the model
multinomial_naive_bayes_optimal.fit(x_train_counts, y_train)
# predict the response
pred_cat = multinomial_naive_bayes_optimal.predict(x_test_counts)
category_predicted.append(pred_cat)
pred = multinomial_naive_bayes_optimal.predict_proba(x_test_counts)
log_loss_acc = log_loss(y_test, pred)
print('\nThe accuracy of the Multinomial Naive Bayes classifier for alpha = %f and metric = %s is %f' % (optimal_alpha, metric, log_loss_acc))
return category_predicted
category_predicted = prediction(optimal_alpha, 'neg_log_loss')
接下来,我尝试创建一个包含两列y_test和category_predicted的数据帧,并尝试用y_test和predicted_category的值填充数据帧:
df = pd.DataFrame()
df['Y_test'] = y_test
df['category_predicted'] = category_predicted
print(df)
它给出了以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-88-a34fff6f93f7> in <module>()
3 #category_predicted = np.array(category_predicted)
4 #categor_trans = category_predicted.transpose()
----> 5 df['category_predicted'] = category_predicted
6 print(df)
7 print(len(category_predicted))
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
2517 else:
2518 # set column
-> 2519 self._set_item(key, value)
2520
2521 def _setitem_slice(self, key, value):
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/frame.py in _set_item(self, key, value)
2583
2584 self._ensure_valid_index(value)
-> 2585 value = self._sanitize_column(key, value)
2586 NDFrame._set_item(self, key, value)
2587
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
2758
2759 # turn me into an ndarray
-> 2760 value = _sanitize_index(value, self.index, copy=False)
2761 if not isinstance(value, (np.ndarray, Index)):
2762 if isinstance(value, list) and len(value) > 0:
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in _sanitize_index(data, index, copy)
3119
3120 if len(data) != len(index):
-> 3121 raise ValueError('Length of values does not match length of ' 'index')
3122
3123 if isinstance(data, PeriodIndex):
ValueError: Length of values does not match length of index
输出:
1
289
category_predicted Y_test
0 [11, 19, 19, 33, 12, 1, 22, 30, 11, 19, 22, 11... 31
Y_test
1208 16
1013 19
1016 19
1153 5
1434 12
65 1
943 17
425 23
1104 4
1052 19
342 22
523 11
487 11
458 11
1243 10
771 6
1355 7
692 9
981 32
1159 5
924 17
880 33
273 22
360 23
295 22
1101 4
391 23
1025 19
1047 19
1238 10
... ...
1240 10
168 2
174 2
484 11
194 30
1184 5
967 32
1250 10
185 2
772 6
750 6
633 29
230 30
1309 8
279 22
542 35
119 2
439 23
392 23
1152 5
769 6
1129 21
858 33
615 29
661 9
244 30
1295 27
1100 4
345 22
960 32
[289 rows x 1 columns]
编辑:
df['category_predicted'] = category_predicted
df['Y_test'] = y_test
print(df)
输出:
1
289
category_predicted Y_test
0 [11, 19, 19, 33, 12, 1, 22, 30, 11, 19, 22, 11... 31
Y_test
1208 16
1013 19
1016 19
1153 5
1434 12
65 1
943 17
425 23
1104 4
1052 19
342 22
523 11
487 11
458 11
1243 10
771 6
1355 7
692 9
981 32
1159 5
924 17
880 33
273 22
360 23
295 22
1101 4
391 23
1025 19
1047 19
1238 10
... ...
1240 10
168 2
174 2
484 11
194 30
1184 5
967 32
1250 10
185 2
772 6
750 6
633 29
230 30
1309 8
279 22
542 35
119 2
439 23
392 23
1152 5
769 6
1129 21
858 33
615 29
661 9
244 30
1295 27
1100 4
345 22
960 32
[289 rows x 1 columns]
如果我写:
df['category\u predicted']=category\u predicted
df['Y_测试']=Y_测试
打印(df)
输出:
如果我写:
df['Y_test'] = y_test
print(df)
输出:
1
289
category_predicted Y_test
0 [11, 19, 19, 33, 12, 1, 22, 30, 11, 19, 22, 11... 31
Y_test
1208 16
1013 19
1016 19
1153 5
1434 12
65 1
943 17
425 23
1104 4
1052 19
342 22
523 11
487 11
458 11
1243 10
771 6
1355 7
692 9
981 32
1159 5
924 17
880 33
273 22
360 23
295 22
1101 4
391 23
1025 19
1047 19
1238 10
... ...
1240 10
168 2
174 2
484 11
194 30
1184 5
967 32
1250 10
185 2
772 6
750 6
633 29
230 30
1309 8
279 22
542 35
119 2
439 23
392 23
1152 5
769 6
1129 21
858 33
615 29
661 9
244 30
1295 27
1100 4
345 22
960 32
[289 rows x 1 columns]
我希望y_测试和category_预测都作为两列打印在一起
那么,为什么类别的长度会预测一个呢?应该是列表中元素的数量。我认为您的问题在于以下几行:
pred_cat = multinomial_naive_bayes_optimal.predict(x_test_counts)
category_predicted.append(pred_cat)
.predict()已返回一个列表。然后,您正在将列表添加到列表中。所以现在预测的类别是这样的:
category_predicted = [[1,2,3,3,4,3,4,54]]
我认为您只需要将其更改为:
df['category_predicted'] = category_predicted[0]
df['Y_test'] = y_test
或者只返回pred_cat而不是在预测函数中预测的category_我认为您的问题在于以下几行:
pred_cat = multinomial_naive_bayes_optimal.predict(x_test_counts)
category_predicted.append(pred_cat)
.predict()已返回一个列表。然后,您正在将列表添加到列表中。所以现在预测的类别是这样的:
category_predicted = [[1,2,3,3,4,3,4,54]]
我认为您只需要将其更改为:
df['category_predicted'] = category_predicted[0]
df['Y_test'] = y_test
或者只返回pred_cat而不是预测函数中预测的类别尝试在
y_测试之前添加category_predicted
请检查我的编辑。尝试在y_测试之前添加category_predicted
请检查我的编辑。