Python 从烧瓶查询蜂箱_Python_Flask_Hive_Pyhive

Python 从烧瓶查询蜂箱

python flask hive

Python 从烧瓶查询蜂箱,python,flask,hive,pyhive,Python,Flask,Hive,Pyhive,我是新来的烧瓶，我用下面的快速原型开始。该项目的主要思想是从hive集群收集数据，并使用flask将其推送到最终用户虽然我使用pyhive连接器成功地将flask连接到了hive服务器，但我遇到了一个奇怪的问题，这与select limit有关，我试图查询50多个项目在我的例子中，我构建了一个类似于flask扩展开发的Hive类，用于pyhive类似的演示： from pyhive import hive from flask import current_app # Find the s

我是新来的烧瓶，我用下面的快速原型开始。该项目的主要思想是从hive集群收集数据，并使用flask将其推送到最终用户

虽然我使用

pyhive

连接器成功地将flask连接到了hive服务器，但我遇到了一个奇怪的问题，这与

select limit

有关，我试图查询50多个项目

在我的例子中，我构建了一个类似于flask扩展开发的Hive类，用于

pyhive

类似的演示：

from pyhive import hive
from flask import current_app

# Find the stack on which we want to store the database connection.
# Starting with Flask 0.9, the _app_ctx_stack is the correct one,
# before that we need to use the _request_ctx_stack.
try:
    from flask import _app_ctx_stack as stack
except ImportError:
    from flask import _request_ctx_stack as stack


class Hive(object):

    def __init__(self, app=None):
        self.app = app
        if app is not None:
            self.init_app(app)

    def init_app(self, app):
        # Use the newstyle teardown_appcontext if it's available,
        # otherwise fall back to the request context
        if hasattr(app, 'teardown_appcontext'):
            app.teardown_appcontext(self.teardown)
        else:
            app.teardown_request(self.teardown)

    def connect(self):
        return hive.connect(current_app.config['HIVE_DATABASE_URI'], database="orc")

    def teardown(self, exception):
        ctx = stack.top
        if hasattr(ctx, 'hive_db'):
            ctx.hive_db.close()
        return None

    @property
    def connection(self):
        ctx = stack.top
        if ctx is not None:
            if not hasattr(ctx, 'hive_db'):
                ctx.hive_db = self.connect()
            return ctx.hive_db

并创建了一个端点以从配置单元加载数据：

@blueprint.route('/hive/<limit>')
def connect_to_hive(limit):
    cur = hive.connection.cursor()
    query = "select * from part_raw where year=2018 LIMIT {0}".format(limit)
    cur.execute(query)
    res = cur.fetchall()
    return jsonify(data=res)

@blueprint.route（“/hive/”）
def连接到配置单元（限制）：
cur=hive.connection.cursor（）
query=“选择*来自原始部分，其中年份=2018限制{0}”。格式（限制）
当前执行（查询）
res=cur.fetchall（）
返回jsonify（data=res）

在第一次运行时，如果我尝试加载限制为50个项目的项目，那么一切都可以正常工作，但一旦我增加，它将保持在不加载任何项目的状态。但是，当我使用jupyter笔记本加载数据时，它工作正常，这就是为什么我怀疑我可能遗漏了flask代码中的某些内容。

问题是库版本问题，通过在我的要求中添加以下内容解决了这一问题：

# Hive with needed dependencies
sasl==0.2.1
thrift==0.11.0
thrift-sasl==0.3.0
PyHive==0.6.1

旧版本如下：

sasl>=0.2.1
thrift>=0.10.0
#thrift_sasl>=0.1.0
git+https://github.com/cloudera/thrift_sasl  # Using master branch in order to get Python 3 SASL patches
PyHive==0.6.1

正如pyhive项目中的开发需求文件所述