Python 获取类型错误：还原操作'；argmax'；尝试使用idxmax（）时不允许此数据类型_Python_Python 3.x_Pandas

Python 获取类型错误：还原操作'；argmax'；尝试使用idxmax（）时不允许此数据类型

python python-3.x pandas

Python 获取类型错误：还原操作'；argmax'；尝试使用idxmax（）时不允许此数据类型,python,python-3.x,pandas,Python,Python 3.x,Pandas,在Pandas中使用idxmax（）函数时，我一直收到此错误 Traceback (most recent call last): File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module> best_c_param = classify.print_kfold_scores(X_training_undersampled, y_trainin

在Pandas中使用

idxmax（）

函数时，我一直收到此错误

Traceback (most recent call last):
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module>
    best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/Classification.py", line 39, in print_kfold_scores
    best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 1369, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

分类.py

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.metrics import confusion_matrix, precision_recall_curve, auc, \
    roc_auc_score, roc_curve, recall_score, classification_report
import pandas as pd
import numpy as np


def print_kfold_scores(X_training, y_training):
    print('\nKFold\n')

    fold = KFold(len(y_training), 5, shuffle=False)

    c_param_range = [0.01, 0.1, 1, 10, 100]

    results = pd.DataFrame(index=range(len(c_param_range), 2), columns=['C_parameter', 'Mean recall score'])
    results['C_parameter'] = c_param_range

    j = 0
    for c_param in c_param_range:
        print('-------------------------------------------')
        print('C parameter: ', c_param)
        print('\n-------------------------------------------')

        recall_accs = []
        for iteration, indices in enumerate(fold, start=1):
            lr = LogisticRegression(C=c_param, penalty='l1')
            lr.fit(X_training.iloc[indices[0], :], y_training.iloc[indices[0], :].values.ravel())

            y_prediction_undersampled = lr.predict(X_training.iloc[indices[1], :].values)
            recall_acc = recall_score(y_training.iloc[indices[1], :].values, y_prediction_undersampled)
            recall_accs.append(recall_acc)
            print('Iteration ', iteration, ': recall score = ', recall_acc)

        results.ix[j, 'Mean recall score'] = np.mean(recall_accs)
        j += 1
        print('\nMean recall score ', np.mean(recall_accs))
        print('\n')

    best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter'] # Error occurs on this line

    print('*****************************************************************')
    print('Best model to choose from cross validation is with C parameter = ', best_c_param)
    print('*****************************************************************')

    return best_c_param

导致问题的线路如下

best_c_param=results.loc[results['Mean recall score'].idxmax（）]['c_parameter']

程序的输出如下所示

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/username/College/year-4/fyp-credit-card-fraud/code/main.py
/Users/username/Library/Python/3.6/lib/python/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Dataset Ratios

Percentage of genuine transactions:  0.5
Percentage of fraudulent transactions 0.5
Total number of transactions in resampled data:  984


Whole Dataset Split

Number of transactions in training dataset:  199364
Number of transactions in testing dataset:  85443
Total number of transactions in dataset:  284807


Undersampled Dataset Split

Number of transactions in training dataset 688
Number of transactions in testing dataset:  296
Total number of transactions in dataset:  984

KFold

-------------------------------------------
C parameter:  0.01

-------------------------------------------
Iteration  1 : recall score =  0.931506849315
Iteration  2 : recall score =  0.917808219178
Iteration  3 : recall score =  1.0
Iteration  4 : recall score =  0.959459459459
Iteration  5 : recall score =  0.954545454545

Mean recall score  0.9526639965


-------------------------------------------
C parameter:  0.1

-------------------------------------------
Iteration  1 : recall score =  0.849315068493
Iteration  2 : recall score =  0.86301369863
Iteration  3 : recall score =  0.915254237288
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.909090909091

Mean recall score  0.89652397189


-------------------------------------------
C parameter:  1

-------------------------------------------
Iteration  1 : recall score =  0.86301369863
Iteration  2 : recall score =  0.86301369863
Iteration  3 : recall score =  0.983050847458
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.924242424242

Mean recall score  0.915853322981


-------------------------------------------
C parameter:  10

-------------------------------------------
Iteration  1 : recall score =  0.849315068493
Iteration  2 : recall score =  0.876712328767
Iteration  3 : recall score =  0.983050847458
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.939393939394

Mean recall score  0.918883626012


-------------------------------------------
C parameter:  100

-------------------------------------------
Iteration  1 : recall score =  0.86301369863
Iteration  2 : recall score =  0.876712328767
Iteration  3 : recall score =  0.983050847458
Iteration  4 : recall score =  0.945945945946
Iteration  5 : recall score =  0.924242424242

Mean recall score  0.918593049009


Traceback (most recent call last):
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/main.py", line 20, in <module>
    best_c_param = classify.print_kfold_scores(X_training_undersampled, y_training_undersampled)
  File "/Users/username/College/year-4/fyp-credit-card-fraud/code/Classification.py", line 39, in print_kfold_scores
    best_c_param = results.loc[results['Mean recall score'].idxmax()]['C_parameter']
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/series.py", line 1369, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

Process finished with exit code 1

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6/Users/username/College/year-4/fyp信用卡欺诈/code/main.py
/Users/username/Library/Python/3.6/lib/Python/site packages/sklearn/cross_validation.py:41:DeprecationWarning:0.18版中不推荐使用此模块，而支持将所有重构的类和函数移动到其中的model_选择模块。还请注意，新CV迭代器的接口与此模块的接口不同。此模块将在0.20中删除。
“此模块将在0.20中删除。”，弃用警告）
数据集比率
真实交易百分比：0.5
欺诈交易的百分比0.5
重采样数据中的事务总数：984
整个数据集拆分
培训数据集中的事务数：199364
测试数据集中的事务数：85443
数据集中的事务总数：284807
欠采样数据集分割
培训数据集中的事务数688
测试数据集中的事务数：296
数据集中的事务总数：984
肯福尔德
-------------------------------------------
C参数：0.01
-------------------------------------------
迭代1：召回分数=0.931506849315
迭代2：召回分数=0.91780821978
迭代3：召回分数=1.0
迭代4：召回分数=0.959459459459
迭代5：召回分数=0.954545
平均回忆得分0.9526639965
-------------------------------------------
C参数：0.1
-------------------------------------------
迭代1：召回分数=0.849315068493
迭代2：召回分数=0.86301369863
迭代3：召回分数=0.915254237288
迭代4：召回分数=0.945945945946
迭代5：召回分数=0.9091
平均回忆得分0.89652397189
-------------------------------------------
C参数：1
-------------------------------------------
迭代1：召回分数=0.86301369863
迭代2：召回分数=0.86301369863
迭代3：召回分数=0.983050847458
迭代4：召回分数=0.945945945946
迭代5：召回分数=0.924242
平均回忆得分0.915853322981
-------------------------------------------
C参数：10
-------------------------------------------
迭代1：召回分数=0.849315068493
迭代2：召回分数=0.876712328767
迭代3：召回分数=0.983050847458
迭代4：召回分数=0.945945945946
迭代5：召回分数=0.9394
平均回忆得分0.918883626012
-------------------------------------------
C参数：100
-------------------------------------------
迭代1：召回分数=0.86301369863
迭代2：召回分数=0.876712328767
迭代3：召回分数=0.983050847458
迭代4：召回分数=0.945945945946
迭代5：召回分数=0.924242
平均回忆得分0.918593049009
回溯（最近一次呼叫最后一次）：
文件“/Users/username/College/year-4/fyp信用卡欺诈/code/main.py”，第20行，在
最佳参数=分类。打印分数（样本不足的X\U培训，样本不足的y\U培训）
文件“/Users/username/College/year-4/fyp信用卡欺诈/code/Classification.py”，第39行，打印分数
best_c_param=results.loc[结果['Mean recall score'].idxmax（）]['c_参数']
idxmax中的文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/series.py”，第1369行
i=nanops.nanargmax（来自对象（自身）的值，skipna=skipna）
文件“/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site packages/pandas/core/nanops.py”，第74行，在
raise TypeError（msg.format（name=f.__name___;u.replace（'nan'，''））
TypeError:此数据类型不允许还原操作“argmax”
进程已完成，退出代码为1

我们应该替换这行代码主要问题是： 1） “平均回忆分数”的类型是object，不能使用“idxmax（）”来计算该值 2）您应该将“平均回忆分数”从“对象”更改为“浮动” 3）您可以使用apply（pd.to_numeric，errors='procure'，axis=0）来执行这些操作

best_c = results_table
best_c.dtypes.eq(object) # you can see the type of best_c
new = best_c.columns[best_c.dtypes.eq(object)] #get the object column of the best_c
best_c[new] = best_c[new].apply(pd.to_numeric, errors = 'coerce', axis=0) # change the type of object
best_c
best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter'] #calculate the mean values

简而言之，试试这个

best_c = results_table.loc[results_table['Mean recall score'].astype(float).idxmax()]['C_parameter']

而不是

best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter']

默认情况下，单元格值的类型为非数字

argmin（）

、

idxmin（）

、

argmax（）

和其他类似函数需要数据类型为数字

最简单的解决方案是使用

pd.to_numeric（）

将序列（或列）转换为数字类型。数据帧

df

带有列

'a'

的示例如下：

df['a'] = pd.to_numeric(df['a'])

关于熊猫的类型铸造，可以找到更完整的答案

希望有帮助：）

如果存在NaN（我们可以通过堆栈跟踪看到这一点），那么当您认为您正在使用数字数据帧时，您很可能会使用混合类型，尤其是数字中的字符串。让我给你3个代码示例，前2个工作，最后一个不工作，很可能是你的情况

这表示所有数字数据，它将与idxmax一起使用

the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, 0.4]
the_df = pd.DataFrame(the_dict)

这表示一个数值nan，它将在idxmax中工作

the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, np.NaN]
the_df = pd.DataFrame(the_dict)

这可能正是OP报告的问题，但如果我们以任何方式使用混合类型，我们将得到OP报告的错误

the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, 'NaN']
the_df = pd.DataFrame(the_dict)

欢迎来到SO！你能详细说明一下吗？仅代码的答案可能被视为低质量，因此被删除。类型错误：此数据类型不允许还原操作“argmax”。问题：1）“平均回忆分数”的类型是对象，无法使用“idxmax（）”计算va

the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, np.NaN]
the_df = pd.DataFrame(the_dict)

the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, 'NaN']
the_df = pd.DataFrame(the_dict)