Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python中lightGBM的自定义多类日志丢失函数返回错误_Python_Machine Learning_Data Science_Lightgbm - Fatal编程技术网

python中lightGBM的自定义多类日志丢失函数返回错误

python中lightGBM的自定义多类日志丢失函数返回错误,python,machine-learning,data-science,lightgbm,Python,Machine Learning,Data Science,Lightgbm,我正在尝试实现一个具有自定义目标函数的lightGBM分类器。我的目标数据有四类,我的数据分为12个观察值的自然组 自定义目标函数可实现两件事: 预测的模型输出必须是概率的,并且每次观测的概率总和必须为1。这也称为softmax目标函数,实现起来相对简单 在每个组中,每个类别的概率总和必须为1。这已在二项式分类空间中实现,称为条件logit模型 总之,对于每个组(在我的案例4观察中),每列和每行的概率总和应为1。为了实现这一点,我编写了一个略显粗糙的函数,但当我尝试在python中的xgb框架内

我正在尝试实现一个具有自定义目标函数的lightGBM分类器。我的目标数据有四类,我的数据分为12个观察值的自然组

自定义目标函数可实现两件事:

  • 预测的模型输出必须是概率的,并且每次观测的概率总和必须为1。这也称为softmax目标函数,实现起来相对简单
  • 在每个组中,每个类别的概率总和必须为1。这已在二项式分类空间中实现,称为条件logit模型
  • 总之,对于每个组(在我的案例4观察中),每列和每行的概率总和应为1。为了实现这一点,我编写了一个略显粗糙的函数,但当我尝试在python中的xgb框架内运行自定义目标函数时,会出现以下错误:

    TypeError:无法解压缩不可编辑的numpy.float64对象

    我的全部代码如下:

    import lightgbm as lgb
    import numpy as np
    import pandas as pd
    
    def standardiseProbs(preds, groupSize, eta = 0.1, maxIter = 100):
    
        # add groupId to preds dataframe
        n = preds.shape[0]
        if n % groupSize != 0:
            print('The selected group size paramter is not compatible with the data')
        preds['groupId'] = np.repeat(np.arange(0, int(n/groupSize)), groupSize)
    
        #initialise variables
        error = 10000
        i = 0
    
        # perform loop while error exceeds set threshold (subject to maxIter)
        while error > eta and i<maxIter:
            i += 1
            # get sum of probabilities by game
            byGroup = preds.groupby('groupId')[0, 1, 2, 3].sum().reset_index()
            byGroup.columns = ['groupId', '0G', '1G', '2G', '3G']
    
            if '3G' in list(preds.columns):
                preds = preds.drop(['3G', '2G', '1G', '0G'], axis=1)
            preds = preds.merge(byGroup, how='inner', on='groupId')
    
            # adjust probs to be consistent across a game
            for v in [1, 2, 3]:
                preds[v] = preds[v] / preds[str(v) + 'G']
    
            preds[0] = (groupSize-3)* (preds[0] / preds['0G'])
    
            # sum probabilities by player
            preds['rowSum'] = preds[3] + preds[2] + preds[1] + preds[0]
    
            # adjust probs to be consistent across a player
            for v in [0, 1, 2, 3]:
                preds[v] = preds[v] / preds['rowSum']
    
            # get sum of probabilities by game
            byGroup = preds.groupby('groupId')[0, 1, 2, 3].sum().reset_index()
            byGroup.columns = ['groupId', '0G', '1G', '2G', '3G']
    
            # calc error
            errMat = abs(np.subtract(byGroup[['0G', '1G', '2G', '3G']].values, np.array([(groupSize-3), 1, 1, 1])))
            error = sum(sum(errMat))
    
        preds = preds[['groupId', 0, 1, 2, 3]]
        return preds
    
    def condObjective(preds, train):
        labels = train.get_label()
        preds = pd.DataFrame(np.reshape(preds, (int(preds.shape[0]/4), 4), order='C'), columns=[0,1,2,3])
        n = preds.shape[0]
        yy = np.zeros((n, 4))
        yy[np.arange(n), labels] = 1
        preds['matchId'] = np.repeat(np.arange(0, int(n/4)), 4)
        preds = preds[['matchId', 0, 1, 2, 3]]
        preds = standardiseProbs(preds, groupSize = 4, eta=0.001, maxIter=500)
        preds = preds[[0, 1, 2, 3]].values
        grad = (preds - yy).flatten()
        hess = (preds * (1. - preds)).flatten()
        return grad, hess
    
    def mlogloss(preds, train):
        labels = train.get_label()
        preds = pd.DataFrame(np.reshape(preds, (int(preds.shape[0]/4), 4), order='C'), columns=[0,1,2,3])
        n = preds.shape[0]
        yy = np.zeros((n, 4))
        yy[np.arange(n), labels] = 1
        preds['matchId'] = np.repeat(np.arange(0, int(n/4)), 4)
        preds = preds[['matchId', 0, 1, 2, 3]]
        preds = standardiseProbs(preds, groupSize = 4, eta=0.001, maxIter=500)
        preds = preds[[0, 1, 2, 3]].values
        loss = -(np.sum(yy*np.log(preds)+(1-yy)*np.log(1-preds))/n)
        return loss
    
    n, k = 880, 5
    
    xtrain = np.random.rand(n, k)
    ytrain = np.random.randint(low=0, high=2, size=n)
    ltrain = lgb.Dataset(xtrain, label=ytrain)
    xtest = np.random.rand(int(n/2), k)
    ytest = np.random.randint(low=0, high=2, size=int(n/2))
    ltest = lgb.Dataset(xtrain, label=ytrain)
    
    lgbmParams = {'boosting_type': 'gbdt', 
                  'num_leaves': 250, 
                  'max_depth': 3,
                  'min_data_in_leaf': 10, 
                  'min_gain_to_split': 0.75, 
                  'learning_rate': 0.01, 
                  'subsample_for_bin': 120100, 
                  'min_child_samples': 70, 
                  'reg_alpha': 1.45, 
                  'reg_lambda': 2.5, 
                  'feature_fraction': 0.45, 
                  'bagging_fraction': 0.55, 
                  'is_unbalance': True, 
                  'objective': 'multiclass', 
                  'num_class': 4, 
                  'metric': 'multi_logloss', 
                  'verbose': 1}
    
    lgbmModel = lgb.train(lgbmParams, ltrain, valid_sets=ltest,fobj=condObjective, feval=mlogloss, num_boost_round=5000, early_stopping_rounds=100, verbose_eval=50)
    
    将lightgbm作为lgb导入
    将numpy作为np导入
    作为pd进口熊猫
    def standardiseProbs(preds、groupSize、eta=0.1、maxIter=100):
    #将groupId添加到preds数据帧
    n=预定形状[0]
    如果n%groupSize!=0:
    打印('所选的组大小参数与数据不兼容')
    preds['groupId']=np.repeat(np.arange(0,int(n/groupSize)),groupSize)
    #初始化变量
    误差=10000
    i=0
    #错误超过设定阈值时执行循环(取决于maxIter)
    
    而错误>eta和i则是该错误的问题所在

        -> 2380                 eval_name, val, is_higher_better = feval_ret // this is the return of mlogloss
           2381                 ret.append((data_name, eval_name, val, is_higher_better))
           2382         return ret
    TypeError: 'numpy.float64' object is not iterable
    
    来自函数
    mlogloss()
    。因为您将它用作求值函数,所以它应该返回3个内容:它的名称、值和指示值越高越好的布尔值

    def mlogloss(...):
    ...
    return "my_loss_name", loss_value, False
    

    培训和验证需要两个函数,其中培训自定义损失(lgb.train参数中的feval)需要“grad,hess”作为返回,而我们需要grad,hess,Boolean作为返回,其中Boolean表示损失值越高越好


    请查看此内容,而不是我的博客:

    谢谢您的帮助-仍然会收到一个错误,尽管它有所不同。我再挖深一点。谢谢!接受这个答案,如果你真的被卡住了,你可以发布另一个问题!或者,如果你认为这属于这里,也许只是一句评论:)