Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark pyspark将负值替换为零_Apache Spark_Pyspark_Apache Spark Sql_Pyspark Sql - Fatal编程技术网

Apache spark pyspark将负值替换为零

Apache spark pyspark将负值替换为零,apache-spark,pyspark,apache-spark-sql,pyspark-sql,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Sql,我可能想寻求帮助,将时间戳之间的负值从差替换为零。在spark上运行python3。这是我的密码: 代码: timeFmt=“yyyy-MM-dd HH:MM:ss” time_diff_1=何时((col(“time1”).isNotNull())& (col(“time2”).isNotNull(), (unix_时间戳('time2',格式=timeFmt)-unix_时间戳('time1',格式=timeFmt))/60 )。否则(点亮(0)) time_diff_2=何时((col(“

我可能想寻求帮助,将时间戳之间的负值从差替换为零。在spark上运行python3。这是我的密码:

代码:

timeFmt=“yyyy-MM-dd HH:MM:ss”
time_diff_1=何时((col(“time1”).isNotNull())&
(col(“time2”).isNotNull(),
(unix_时间戳('time2',格式=timeFmt)-unix_时间戳('time1',格式=timeFmt))/60
)。否则(点亮(0))
time_diff_2=何时((col(“time2”).isNotNull())&
(col(“time3”).isNotNull(),
(unix_时间戳('time3',格式=timeFmt)-unix_时间戳('time2',格式=timeFmt))/60
)。否则(点亮(0))
time_diff_3=何时((col(“time3”).isNotNull())&
(col(“time4”).isNotNull(),
(unix_时间戳('time4',格式=timeFmt)-unix_时间戳('time3',格式=timeFmt))/60
)。否则(点亮(0))
df=(df
.带列('time_diff_1',time_diff_1)
.带列('time_diff_2',time_diff_2)
.带列('time_diff_3',time_diff_3)
)
df=(df
.withColumn('time_diff_1'),when(col('time_diff_1')<0,0)。否则(col('time_diff_1'))
.withColumn('time_diff_2',when(col('time_diff_2')<0,0)。否则(col('time_diff_2'))
.withColumn('time_diff_3'),when(col('time_diff_3')<0,0)。否则(col('time_diff_3'))
)
当我运行上面的代码时,我得到一个错误。 以下是错误:

Py4JJavaError:调用o1083.showString时出错: org.apache.spark.sparkeexception:由于阶段失败,作业中止: 阶段56.0中的任务0失败4次,最近一次失败:任务丢失 阶段56.0中的0.3(TID 7246,fxhclxcdh8.dftz.local,executor 21):org.codehaus.janino.JaninoRuntimeException:未能编译: org.codehaus.janino.JaninoRuntimeException:方法代码 “应用9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V” 一流的 “org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection” 超过64 KB/*001/public java.lang.Object生成(Object[] 参考资料){/002/返回新的 详细信息(参考资料);/003/}/004//005/ 类特定的投影扩展 org.apache.spark.sql.catalyst.expressions.UnsafeProjection{/006/ /007/私有对象[]引用;/008/私有布尔值 evalExprIsNull;/009/专用布尔值evalExprValue;/010/
私有布尔值evalexpr1isnll;/011/私有布尔值 evalExpr1Value;/012/private java.text.DateFormat格式化程序5; /013/private java.text.DateFormat格式化程序8;/014/
private java.text.DateFormat格式化程序12;/015/private java.text.DateFormat格式化程序13;/016/private UTF8String.IntWrapper;/017/private java.text.DateFormat格式化程序15;/018/private java.text.DateFormat格式化程序18;/019/private java.text.DateFormat formatter19;/020/private java.text.DateFormat formatter23;/021/private java.text.DateFormat formatter26;/022/private java.text.DateFormat formatter27;/023/private java.text.DateFormat格式化程序30;/024*/private java.text.DateFormat formatter32


任何人都可以提供帮助?

我认为更简单的方法是编写一个简单的UDF(用户定义函数)并将其应用于所需的列。下面是一个示例代码:

import pyspark.sql.functions as f

correctNegativeDiff = f.udf(lambda diff: 0 if diff < 0 else diff, LongType())

df = df.withColumn('time_diff_1', correctNegativeDiff(df.time_diff_1))\
       .withColumn('time_diff_2', correctNegativeDiff(df.time_diff_2))\
       .withColumn('time_diff_3', correctNegativeDiff(df.time_diff_3))
导入pyspark.sql.f函数
correctNegativeDiff=f.udf(lambda diff:0如果diff<0,则为其他diff,LongType())
df=df.withColumn('time_diff_1',correctednegativediff(df.time_diff_1))\
.with列('time_diff_2',correctednegativediff(df.time_diff_2))\
.with列('time_diff_3',correctednegativediff(df.time_diff_3))

请提供一个小问题。谢谢,这段代码帮助我解决了这个问题,只是有一个小问题,它不是返回0而是返回null。不客气,我想这是因为LongType()。将其更改为IntegerType()或FloatType()可能会解决您的问题!我已经更改为双精度类型,因为我的值是双精度类型,但仍然会得到null而不是0。谢谢。这样更改可能会解决:
f.udf(lambda diff:0如果diff
import pyspark.sql.functions as f

correctNegativeDiff = f.udf(lambda diff: 0 if diff < 0 else diff, LongType())

df = df.withColumn('time_diff_1', correctNegativeDiff(df.time_diff_1))\
       .withColumn('time_diff_2', correctNegativeDiff(df.time_diff_2))\
       .withColumn('time_diff_3', correctNegativeDiff(df.time_diff_3))