如何在ipython笔记本中为配置单元查询设置最大分区
我正在使用Ipython笔记本编写脚本如何在ipython笔记本中为配置单元查询设置最大分区,python,hadoop,hive,ipython,ipython-notebook,Python,Hadoop,Hive,Ipython,Ipython Notebook,我正在使用Ipython笔记本编写脚本 import pandas as pd import pyhs2 import os import datetime q1= "set hive.query.max.partition = 3000 ; select 'Device_id' as key, 'All Time' as type, count(distinct a.dev_id) as count from (select distinct dev_id from DevID where
import pandas as pd
import pyhs2
import os
import datetime
q1= "set hive.query.max.partition = 3000 ;
select 'Device_id' as key,
'All Time' as type,
count(distinct a.dev_id) as count
from (select distinct dev_id from DevID
where dev_type = '*****'
union all
select distinct
key_value_lookup(raw_url, '*****', '&', '=') as dev_id
from actions
where raw_url like '%*****%'
and raw_url like '%*****%'
and data_date >= '20150901' and data_date <= '20151231') a"
def read_hive(query):
conn = pyhs2.connect(host='*****',
port=*****,
authMechanism="*****",
user='*****',
password='*****',
database='*****')
cur = conn.cursor()
cur.execute(query)
#Return column info from query
if cur.getSchema() is None:
cur.close()
conn.close()
return Nonea
columnNames = [a['columnName'] for a in cur.getSchema()]
print columnNames
columnNamesStrings = [a['columnName'] for a in cur.getSchema() if a['type']=='STRING_TYPE']
output = pd.DataFrame(cur.fetch(),columns=columnNames)
cur.close()
conn.close()
return output
调用read_hiveq1时,我收到以下错误:
失败,因为hive.query.max.partition需要一个INT值
我认为这是因为我将查询存储在一个字符串中,但我不能完全确定。查询从Hue运行得非常好
有人凭直觉知道改变分区最大数量的最佳方法吗?这可以在我的函数中完成吗?配置单元配置设置应该作为字典传递给pyhs2连接对象,而不是作为要执行的查询字符串的一部分 就你而言:
conn = pyhs2.connect(host='*****',
port=*****,
authMechanism="*****",
user='*****',
password='*****',
database='*****',
configuration={'hive.query.max.partition': '3000'})
配置单元配置设置应作为字典传递给pyhs2连接对象,而不是作为要执行的查询字符串的一部分 就你而言:
conn = pyhs2.connect(host='*****',
port=*****,
authMechanism="*****",
user='*****',
password='*****',
database='*****',
configuration={'hive.query.max.partition': '3000'})