Apache spark 如何基于Pyspark中基于另一列的表达式求值有条件地替换列中的值? 将numpy导入为np df=spark.createDataFrame( [(1,1,无), (1,2,浮动(5)), (1,3,np.nan), (1、4、无), (0,5,浮点数(10)), (1,6,浮动('nan')), (0,6,float('nan'))], (“会话”、“时间戳1”、“id2”)) +-------+----------+----+ |会话|时间戳1 | id2| +-------+----------+----+ |1 | 1 |空| | 1| 2| 5.0| |1 | 3 |南| |1 | 4 |空| | 0| 5|10.0| |1 | 6 |南| |0 | 6 |南| +-------+----------+----+
当session==0时,如何用值999替换timestamp1列的值 预期产出Apache spark 如何基于Pyspark中基于另一列的表达式求值有条件地替换列中的值? 将numpy导入为np df=spark.createDataFrame( [(1,1,无), (1,2,浮动(5)), (1,3,np.nan), (1、4、无), (0,5,浮点数(10)), (1,6,浮动('nan')), (0,6,float('nan'))], (“会话”、“时间戳1”、“id2”)) +-------+----------+----+ |会话|时间戳1 | id2| +-------+----------+----+ |1 | 1 |空| | 1| 2| 5.0| |1 | 3 |南| |1 | 4 |空| | 0| 5|10.0| |1 | 6 |南| |0 | 6 |南| +-------+----------+----+,apache-spark,pyspark,apache-spark-sql,pyspark-sql,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Sql,当session==0时,如何用值999替换timestamp1列的值 预期产出 是否可以在PySpark中使用replace()执行此操作?您应该使用when(使用或)函数: from pyspark.sql.functions import when targetDf = df.withColumn("timestamp1", \ when(df["session"] == 0, 999).otherwise(df["timestamp1"])) 我们如何仅将
是否可以在PySpark中使用replace()执行此操作?您应该使用
when
(使用或)函数:
from pyspark.sql.functions import when
targetDf = df.withColumn("timestamp1", \
when(df["session"] == 0, 999).otherwise(df["timestamp1"]))
我们如何仅将其应用于时间戳1
中的空值?
from pyspark.sql.functions import when
targetDf = df.withColumn("timestamp1", \
when(df["session"] == 0, 999).otherwise(df["timestamp1"]))