Python LightGBM模型预测烧瓶路径中的相同值
我是StackOverflow社区的新用户,感谢您的帮助。 以下是我面临的情况: 我有一个model.py文件,负责使用sklearn的RandomizedSearchCV训练LightGBMRegrator模型。训练后,我用pickle保存模型Python LightGBM模型预测烧瓶路径中的相同值,python,flask,scikit-learn,lightgbm,Python,Flask,Scikit Learn,Lightgbm,我是StackOverflow社区的新用户,感谢您的帮助。 以下是我面临的情况: 我有一个model.py文件,负责使用sklearn的RandomizedSearchCV训练LightGBMRegrator模型。训练后,我用pickle保存模型 n_estimators = [int(x) for x in np.linspace(start = 200, stop = 4000, num = 20)] max_depth = [int(x) for x in np.linsp
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 4000, num = 20)]
max_depth = [int(x) for x in np.linspace(10, 100, num = 10)]
num_leaves = [int(x) for x in np.linspace(10, 150, num = 10)]
learning_rate = [0.03, 0.05, 0.1, 0.2, 0.3]
subsample_for_bin = [100000,200000, 300000, 400000]
random_grid = {'n_estimators': n_estimators,
'max_depth': max_depth,
'num_leaves': num_leaves,
'learning_rate': learning_rate,
'subsample_for_bin': subsample_for_bin}
gbm = lgb.LGBMRegressor()
gbm_random = RandomizedSearchCV(estimator = gbm, param_distributions = random_grid, scoring=['neg_mean_absolute_error', 'neg_root_mean_squared_error'],refit= 'neg_root_mean_squared_error',n_iter = 100, cv = 4, verbose = 2, random_state = 42, n_jobs = -1)
gbm_random.fit(data_base[features_x], data_base[target_y])
pkl_filename = "../output/lightGBM[3].pkl"
with open(pkl_filename, 'wb') as file:
pickle.dump(gbm_random, file)
为了验证训练,我在predict.py文件中加载带有pickle的模型,并提交测试集
data_base_test = pd.read_csv("../output/table_test3.csv")
pkl_filename = "../output/lightGBM[3].pkl"
with open(pkl_filename, 'rb') as file:
gbm = pickle.load(file)
predict_test = gbm.predict(data_base_test[features_x])
print(predict_test)
预测检验是:
[0.66487458 0.82479892 1.89628195 ... 3.83358101 5.21799368 0.33858825]
我对机器学习的东西还可以,但在网络开发领域我完全是个新手。当我使用flask创建web开发时,将模型加载到路由上,并尝试从与前面脚本相同的测试集进行预测,模型中的所有预测都具有相同的值=66。我能面对什么问题?
注意:get_json以json格式接收整个测试集
pkl_filename = "model/lightGBM[3].pkl"
with open(pkl_filename, 'rb') as file:
gbm = pickle.load(file)
app = flask.Flask(__name__, template_folder='templates')
@app.route('/predict', methods=['POST'])
def main():
test_json = request.get_json()
df_json = pd.read_json(test_json, orient='records')
columns_name = df_json.columns.values
columns_name = np.delete(columns_name, np.where('qtde_venda'))
features_x = columns_name.tolist()
#prediction
predict = gbm.predict(df_json[features_x])
print(predict)
return(flask.render_template('main.html'))
if __name__ == '__main__':
app.run()
预测向量为:
[66. 66. 66. ... 66. 66. 66.]
预期产出vs预期产出
[0.66487458 0.82479892 1.89628195 ... 3.83358101 5.21799368 0.33858825]
[66. 66. 66. ... 66. 66. 66.]
我不知道如何解释发生了什么,但谁是造成错误的是巨蟒环境。为了解决这个问题,我删除了anaconda并开始使用Python Venv我想我帮不了什么忙,但我注意到
数据库测试
。这是在别处定义的吗?对不起,我写错了。我刚刚编辑了这个问题。错误仍在继续。出于某种原因,该模型只预测值66。我想知道JSON数据是否正是您所期望的。您能将其与原始CSV数据进行比较吗?是的,我将其与原始CSV进行了比较,这是预期的数据集。您能在flask应用程序中比较CSV和JSON数据集吗?或者你已经这么做了?