Python 2.7 该方法在iPython中运行良好,但在Gunicorn上无休止地运行
我在Falcon框架中编写了一个应用程序,我正在使用Gunicorn服务器运行该应用程序。服务器启动时,应用程序首先学习随机林模型:Python 2.7 该方法在iPython中运行良好,但在Gunicorn上无休止地运行,python-2.7,scikit-learn,wsgi,gunicorn,falconframework,Python 2.7,Scikit Learn,Wsgi,Gunicorn,Falconframework,我在Falcon框架中编写了一个应用程序,我正在使用Gunicorn服务器运行该应用程序。服务器启动时,应用程序首先学习随机林模型: forest = sklearn.ensemble.ExtraTreesClassifier(n_estimators=150, n_jobs=-1) forest.fit(x, t) 然后返回发送到它的请求的概率。当我在iPython中运行代码时,这在我的服务器上运行得很好(训练这个模型需要15秒,在12个内核上运行) 当我编写应用程序时,我设置了n_估计器=
forest = sklearn.ensemble.ExtraTreesClassifier(n_estimators=150, n_jobs=-1)
forest.fit(x, t)
然后返回发送到它的请求的概率。当我在iPython中运行代码时,这在我的服务器上运行得很好(训练这个模型需要15秒,在12个内核上运行)
当我编写应用程序时,我设置了n_估计器=10
,一切正常。当我调整完应用程序后,我将n_估计器
设置回150。然而,当我使用Gunicorn-c./app.conf app:app
运行Gunicorn时,从htop中我可以看到forest.fit(x,t)
在所有内核上运行了几秒钟,之后所有内核的使用率下降到0。之后,该方法将无限期运行,直到Gunicorn worker在10分钟后超时
这是我第一次使用Gunicorn和Falcon,或者任何WSGI技术来解决这个问题,我不知道是什么导致了这个问题,也不知道如何解决这个问题
编辑:
gunicorn的设置文件:
# app.conf
# run with gunicorn -c ./app.conf app:app
import sys
sys.path.append('/home/user/project/Module')
bind = "127.0.0.1:8123"
timeout = 60*20 # Timeout worker after more than 20 minutes`
猎鹰密码:
class Processor(object):
""" Processor object handles the training of the models,
feature generation of requests and probability predictions.
"""
# data statistics used in feature calculations
data_statistics = {}
# Classification targets
targets = ()
# Select features for the models.
cols1 = [ #...
]
cols2 = [ #...
]
model_1 = ExtraTreesClassifier(n_estimators=150, n_jobs=-1)
model_2 = ExtraTreesClassifier(n_estimators=150, n_jobs=-1)
def __init__(self, features_dataset, tr_prepro):
# Get the datasets
da_1, da_2 = self.prepare_datasets(features_dataset)
# Train models
# ----THIS IS WHERE THE PROGRAM HANGS -----------------------------------
self.model_1.fit(da_1.x, utils.vectors_to_labels(da_1.t))
# -----------------------------------------------------------------------
self.model_2.fit(da_2.x, utils.vectors_to_labels(da_2.t))
# Generate data statistics for feature calculations
self.calculate_data_statistics(tr_prepro)
def prepare_datasets(self, features_dataset):
sel_cols = [ #...
]
# Build dataset
d = features_dataset[sel_cols].dropna()
da, scalers = ft.build_dataset(d, scaling='std', target_feature='outcome')
# Binirize data
da_bin = utils.binirize_dataset(da)
# What are the classification targets
self.targets = da_bin.t_labels
# Prepare the datasets
da_1 = da_bin.select_attributes(self.cols1)
da_2 = da_bin.select_attributes(self.cols2)
return da_1, da_2
def calculate_data_statistics(self, tr_prepro):
logger.info('Getting data and feature statistics...')
#...
logger.info('Done.')
def import_data(self, data):
# convert dictionary generated from json to Pandas DataFrame
return tr
def generate_features(self, tr):
# Preprocessing, Feature calculations, imputation
return tr
def predict_proba(self, data):
# Convert Data
tr = self.import_data(data)
# Generate features
tr = self.generate_features(tr)
# Select model based on missing values - either no. 1 or no. 2
tr_1 = #...
tr_2 = #...
# Get the probabilities from different models
if tr_1.shape[0] > 0:
tr_1.loc[:, 'prob'] = self.model_1.predict_proba(tr_1.loc[:, self.cols1])[:, self.targets.index('POSITIVE')]
if tr_2.shape[0] > 0:
tr_2.loc[:, 'prob'] = self.model_2.predict_proba(tr_2.loc[:, self.cols2])[:, self.targets.index('POSITIVE')]
return pd.concat([tr_1, tr_2], axis=0)
@staticmethod
def export_single_result(tr):
result = {'sample_id': tr.loc[0, 'sample_id'],
'batch_id': tr.loc[0, 'batch_id'],
'prob': tr.loc[0, 'prob']
}
return result
class JSONTranslator(object):
def process_request(self, req, resp):
"""Generic method for extracting json from requets
Throws
------
HTTP 400 (Bad Request)
HTTP 753 ('Syntax Error')
"""
if req.content_length in (None, 0):
# Nothing to do
return
body = req.stream.read()
if not body:
raise falcon.HTTPBadRequest('Empty request body',
'A valid JSON document is required.')
try:
req.context['data'] = json.loads(body.decode('utf-8'))
except (ValueError, UnicodeDecodeError):
raise falcon.HTTPError(falcon.HTTP_753,
'Malformed JSON',
'Could not decode the request body. The '
'JSON was incorrect or not encoded as '
'UTF-8.')
def process_response(self, req, resp, resource):
"""Generic method for putting response to json
Does not do anything if 'result_json' not in req.context.
"""
if 'result_json' not in req.context:
return
resp.body = json.dumps(req.context['result_json'])
class ProbResource(object):
def __init__(self, processor):
self.schema_raw = open(config.__ROOT__ + "app_docs/broadcast_schema.json").read()
self.schema = json.loads(self.schema_raw)
self.processor = processor
def validate_request(self, req):
""" Validate the request json against the schema.
Throws
------
HTTP 753 ('Syntax Error')
"""
data = req.context['data']
# validate the json
try:
v = jsonschema.Draft4Validator(self.schema) # using jsonschema draft 4
err_msg = str()
for error in sorted(v.iter_errors(data), key=str):
err_msg += str(error)
if len(err_msg) > 0:
raise falcon.HTTPError(falcon.HTTP_753,
'JSON failed validation',
err_msg)
except jsonschema.ValidationError as e:
print("Failed to use schema:\n" + str(self.schema_raw))
raise e
def on_get(self, req, resp):
"""Handles GET requests
Throws
------
HTTP 404 (Not Found)
"""
self.validate_request(req)
data = req.context['data']
try:
# get probability
tr = self.processor.predict_proba(data)
# convert pandas dataframe to dictionary
result = self.processor.export_single_result(tr)
# send the dictionary away
req.context['result_json'] = result
except Exception as ex:
raise falcon.HTTPError(falcon.HTTP_404, 'Error', ex.message)
resp.status = falcon.HTTP_200
# Get data
features_data = fastserialize.load(config.__ROOT__ + 'data/samples.s')
prepro_data = fastserialize.load(config.__ROOT__ + 'data/prepro/samples_preprocessed.s')
# Get the models - this is where the code hangs
sp = SampleProcessor(features_data, prepro_data)
app = falcon.API(middleware=[JSONTranslator()])
prob = ProbResource(sp)
app.add_route('/prob', prob)
如果没有关于你的设置的更多信息,很难判断。您的配置是什么,您的Falcon资源是如何设置的?好的,我在问题中添加了代码,我试图尽可能地消除混乱,但仍然是一大块。在我看来,没有运行代码的情况下,出现问题的原因似乎是您的处理器过载
gunicorn
;您是否尝试过在ProbResource
中创建sp
实例,而不是将其传递给它?我以前没有尝试过这一点,它对我创建对象的位置有影响吗?好,我尝试在ProbResource
的\uuuuu init\uuuuu
中创建sp
实例,但在行为上没有区别:-(如果没有更多关于您的设置的信息,很难判断。您的配置是什么,Falcon资源是如何设置的等等?好的,我在问题中添加了代码,我试图尽可能地消除混乱,但它仍然是一大块。在我看来,如果不运行代码,问题会出现,因为您的处理器超载gunicorn
;您是否尝试过在ProbResource
中创建sp
实例,而不是将其传递给它?我以前没有尝试过,这会影响我在哪里创建对象?好的,我尝试过在ProbResource
的初始化中创建sp
实例在行为上没有区别:-(