Apache spark Bluemix Apache Spark UDF
我正在Bluemix Apache Spark服务中运行这段代码,该服务几天前就开始工作了:Apache spark Bluemix Apache Spark UDF,apache-spark,pyspark,ibm-cloud,user-defined-functions,Apache Spark,Pyspark,Ibm Cloud,User Defined Functions,我正在Bluemix Apache Spark服务中运行这段代码,该服务几天前就开始工作了: from pyspark.sql.types import StringType, IntegerType, StructType, StructField # define timedelta function (obtain duration in seconds) from datetime import datetime #Function for deriving difference be
from pyspark.sql.types import StringType, IntegerType, StructType, StructField
# define timedelta function (obtain duration in seconds)
from datetime import datetime
#Function for deriving difference between two time points because Spark does not support it natively
def time_delta(y,x):
delta = x - y
return int(delta.total_seconds()/60)
diff = udf(time_delta, IntegerType())
#Register user defined function
sqlContext.registerFunction("udf_diff", time_delta, returnType=IntegerType())
res3 = res2.withColumn("waittime", diff(res2["last_bike_time"], res2["bike_came_time"]))
res3.count()
在这部分代码(映射、连接等)之前,我正在进行一些Spark计算,这很好,但是当我试图定义udf时,我得到了这个错误
--------------------------------------------------------------------------- Exception Traceback (most recent call last) <ipython-input-14-2f188d63cae0> in <module>()
9 return int(delta.total_seconds()/60)
10
---> 11 diff = udf(time_delta, IntegerType())
12
13 #Register user defined function
/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py in udf(f, returnType) 1595 [Row(slen=5), Row(slen=3)] 1596 """
-> 1597 return UserDefinedFunction(f, returnType) 1598 1599 blacklist = ['map', 'since', 'ignore_unicode_prefix']
/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py in
__init__(self, func, returnType, name) 1556 self.returnType = returnType 1557 self._broadcast = None
-> 1558 self._judf = self._create_judf(name) 1559 1560 def _create_judf(self, name):
/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py in
_create_judf(self, name) 1567 pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command, self) 1568 ctx = SQLContext.getOrCreate(sc)
-> 1569 jdt = ctx._ssql_ctx.parseDataType(self.returnType.json()) 1570 if name is None: 1571 name = f.__name__ if hasattr(f, '__name__') else f.__class__.__name__
/usr/local/src/spark160master/spark/python/pyspark/sql/context.py in
_ssql_ctx(self)
689 raise Exception("You must build Spark with Hive. "
690 "Export 'SPARK_HIVE=true' and run "
--> 691 "build/sbt assembly", e)
692
693 def _get_hive_ctx(self):
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o208))
--------------------------------------------------------------------------------------异常回溯(最近一次调用最后一次)在()
9返回整数(增量总秒数()/60)
10
--->11 diff=udf(时间增量,整数类型())
12
13#注册用户定义函数
/udf(f,returnType)1595[行(slen=5),行(slen=3)]1596''中的usr/local/src/spark160master/spark/python/pyspark/sql/functions.py
->1597 return UserDefinedFunction(f,returnType)1598 1599 blacklist=['map','since','ignore\u unicode\u prefix']
/中的usr/local/src/spark160master/spark/python/pyspark/sql/functions.py
__初始化(self,func,returnType,name)1556 self.returnType=returnType 1557 self.\u广播=None
->1558 self.\u judf=self.\u create\u judf(姓名)1559 1560 def\u create\u judf(姓名):
/中的usr/local/src/spark160master/spark/python/pyspark/sql/functions.py
_create_judf(self,name)1567 pickled_命令,broadcast_vars,env,includes=_prepare_for_python_RDD(sc,command,self)1568 ctx=SQLContext.getOrCreate(sc)
->1569 jdt=ctx.\u ssql\u ctx.parseDataType(self.returnType.json())1570如果name为None:1571 name=f.。\uuuuuuu name\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu__
/中的usr/local/src/spark160master/spark/python/pyspark/sql/context.py
_ssql_ctx(自身)
689引发异常(“您必须使用配置单元构建Spark。”
690“导出'SPARK_HIVE=true'并运行”
-->691“建造/sbt组装”,e)
692
693 def_get_hive_ctx(自身):
异常:(“您必须使用Hive.Export'Spark\u Hive=true'构建Spark并运行build/sbt assembly”,Py4JJavaError(调用None.org.apache.Spark.sql.Hive.HiveContext时出错。\n',JavaObject id=o208))
可能是他们在Spark build(IBM)中改变了什么吗?您能演示如何创建输入列吗?您是否导入了
pyspark.sql.functions.udf
?@zero323输入列是sql查询的产物,但这与异常无关,因为在定义udf@WoodChopper我在之前的一个笔记本手机里做的