Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/redis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何基于Pyspark中基于另一列的表达式求值有条件地替换列中的值? 将numpy导入为np df=spark.createDataFrame( [(1,1,无), (1,2,浮动(5)), (1,3,np.nan), (1、4、无), (0,5,浮点数(10)), (1,6,浮动('nan')), (0,6,float('nan'))], (“会话”、“时间戳1”、“id2”)) +-------+----------+----+ |会话|时间戳1 | id2| +-------+----------+----+ |1 | 1 |空| | 1| 2| 5.0| |1 | 3 |南| |1 | 4 |空| | 0| 5|10.0| |1 | 6 |南| |0 | 6 |南| +-------+----------+----+_Apache Spark_Pyspark_Apache Spark Sql_Pyspark Sql - Fatal编程技术网

Apache spark 如何基于Pyspark中基于另一列的表达式求值有条件地替换列中的值? 将numpy导入为np df=spark.createDataFrame( [(1,1,无), (1,2,浮动(5)), (1,3,np.nan), (1、4、无), (0,5,浮点数(10)), (1,6,浮动('nan')), (0,6,float('nan'))], (“会话”、“时间戳1”、“id2”)) +-------+----------+----+ |会话|时间戳1 | id2| +-------+----------+----+ |1 | 1 |空| | 1| 2| 5.0| |1 | 3 |南| |1 | 4 |空| | 0| 5|10.0| |1 | 6 |南| |0 | 6 |南| +-------+----------+----+

Apache spark 如何基于Pyspark中基于另一列的表达式求值有条件地替换列中的值? 将numpy导入为np df=spark.createDataFrame( [(1,1,无), (1,2,浮动(5)), (1,3,np.nan), (1、4、无), (0,5,浮点数(10)), (1,6,浮动('nan')), (0,6,float('nan'))], (“会话”、“时间戳1”、“id2”)) +-------+----------+----+ |会话|时间戳1 | id2| +-------+----------+----+ |1 | 1 |空| | 1| 2| 5.0| |1 | 3 |南| |1 | 4 |空| | 0| 5|10.0| |1 | 6 |南| |0 | 6 |南| +-------+----------+----+,apache-spark,pyspark,apache-spark-sql,pyspark-sql,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Sql,当session==0时,如何用值999替换timestamp1列的值 预期产出 是否可以在PySpark中使用replace()执行此操作?您应该使用when(使用或)函数: from pyspark.sql.functions import when targetDf = df.withColumn("timestamp1", \ when(df["session"] == 0, 999).otherwise(df["timestamp1"])) 我们如何仅将

当session==0时,如何用值999替换timestamp1列的值

预期产出


是否可以在PySpark中使用replace()执行此操作?

您应该使用
when
(使用
)函数:

from pyspark.sql.functions import when

targetDf = df.withColumn("timestamp1", \
              when(df["session"] == 0, 999).otherwise(df["timestamp1"]))

我们如何仅将其应用于
时间戳1
中的空值?
from pyspark.sql.functions import when

targetDf = df.withColumn("timestamp1", \
              when(df["session"] == 0, 999).otherwise(df["timestamp1"]))