Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Bluemix Apache Spark UDF_Apache Spark_Pyspark_Ibm Cloud_User Defined Functions - Fatal编程技术网

Apache spark Bluemix Apache Spark UDF

Apache spark Bluemix Apache Spark UDF,apache-spark,pyspark,ibm-cloud,user-defined-functions,Apache Spark,Pyspark,Ibm Cloud,User Defined Functions,我正在Bluemix Apache Spark服务中运行这段代码,该服务几天前就开始工作了: from pyspark.sql.types import StringType, IntegerType, StructType, StructField # define timedelta function (obtain duration in seconds) from datetime import datetime #Function for deriving difference be

我正在Bluemix Apache Spark服务中运行这段代码,该服务几天前就开始工作了:

from pyspark.sql.types import StringType, IntegerType, StructType, StructField
# define timedelta function (obtain duration in seconds)
from datetime import datetime

#Function for deriving difference between two time points because Spark does not support it natively
def time_delta(y,x): 
    delta = x - y
    return int(delta.total_seconds()/60)

diff = udf(time_delta, IntegerType())

#Register user defined function
sqlContext.registerFunction("udf_diff", time_delta, returnType=IntegerType())


res3 = res2.withColumn("waittime", diff(res2["last_bike_time"], res2["bike_came_time"]))


res3.count()
在这部分代码(映射、连接等)之前,我正在进行一些Spark计算,这很好,但是当我试图定义udf时,我得到了这个错误

--------------------------------------------------------------------------- Exception                                 Traceback (most recent call last) <ipython-input-14-2f188d63cae0> in <module>()
      9     return int(delta.total_seconds()/60)
     10 
---> 11 diff = udf(time_delta, IntegerType())
     12 
     13 #Register user defined function

/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py in udf(f, returnType)    1595     [Row(slen=5), Row(slen=3)]    1596     """
-> 1597     return UserDefinedFunction(f, returnType)    1598     1599 blacklist = ['map', 'since', 'ignore_unicode_prefix']

/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py in
__init__(self, func, returnType, name)    1556         self.returnType = returnType    1557         self._broadcast = None
-> 1558         self._judf = self._create_judf(name)    1559     1560     def _create_judf(self, name):

/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py in
_create_judf(self, name)    1567         pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command, self)    1568         ctx = SQLContext.getOrCreate(sc)
-> 1569         jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())    1570         if name is None:    1571             name = f.__name__ if hasattr(f, '__name__') else f.__class__.__name__

/usr/local/src/spark160master/spark/python/pyspark/sql/context.py in
_ssql_ctx(self)
    689             raise Exception("You must build Spark with Hive. "
    690                             "Export 'SPARK_HIVE=true' and run "
--> 691                             "build/sbt assembly", e)
    692 
    693     def _get_hive_ctx(self):

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o208))
--------------------------------------------------------------------------------------异常回溯(最近一次调用最后一次)在()
9返回整数(增量总秒数()/60)
10
--->11 diff=udf(时间增量,整数类型())
12
13#注册用户定义函数
/udf(f,returnType)1595[行(slen=5),行(slen=3)]1596''中的usr/local/src/spark160master/spark/python/pyspark/sql/functions.py
->1597 return UserDefinedFunction(f,returnType)1598 1599 blacklist=['map','since','ignore\u unicode\u prefix']
/中的usr/local/src/spark160master/spark/python/pyspark/sql/functions.py
__初始化(self,func,returnType,name)1556 self.returnType=returnType 1557 self.\u广播=None
->1558 self.\u judf=self.\u create\u judf(姓名)1559 1560 def\u create\u judf(姓名):
/中的usr/local/src/spark160master/spark/python/pyspark/sql/functions.py
_create_judf(self,name)1567 pickled_命令,broadcast_vars,env,includes=_prepare_for_python_RDD(sc,command,self)1568 ctx=SQLContext.getOrCreate(sc)
->1569 jdt=ctx.\u ssql\u ctx.parseDataType(self.returnType.json())1570如果name为None:1571 name=f.。\uuuuuuu name\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu__
/中的usr/local/src/spark160master/spark/python/pyspark/sql/context.py
_ssql_ctx(自身)
689引发异常(“您必须使用配置单元构建Spark。”
690“导出'SPARK_HIVE=true'并运行”
-->691“建造/sbt组装”,e)
692
693 def_get_hive_ctx(自身):
异常:(“您必须使用Hive.Export'Spark\u Hive=true'构建Spark并运行build/sbt assembly”,Py4JJavaError(调用None.org.apache.Spark.sql.Hive.HiveContext时出错。\n',JavaObject id=o208))

可能是他们在Spark build(IBM)中改变了什么吗?

您能演示如何创建输入列吗?您是否导入了
pyspark.sql.functions.udf
?@zero323输入列是sql查询的产物,但这与异常无关,因为在定义udf@WoodChopper我在之前的一个笔记本手机里做的