Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 在spark数据帧中合并时间戳列的最有效方法_Apache Spark_Dataframe_Pyspark - Fatal编程技术网

Apache spark 在spark数据帧中合并时间戳列的最有效方法

Apache spark 在spark数据帧中合并时间戳列的最有效方法,apache-spark,dataframe,pyspark,Apache Spark,Dataframe,Pyspark,在spark数据帧中合并两列最有效的方法是什么 我有两列意思相同。timestamp中的空值应该用到appenddata\u timestamp 当两列都有值时,表示值相等 我有这个: +--------------------+----------------------+--------+ | timestamp|toAppendData_timestamp| value| +--------------------+----------------------+--

在spark数据帧中合并两列最有效的方法是什么

我有两列意思相同。
timestamp
中的空值应该用
到appenddata\u timestamp

当两列都有值时,表示值相等

我有这个:

+--------------------+----------------------+--------+
|           timestamp|toAppendData_timestamp|   value|
+--------------------+----------------------+--------+
|2016-03-24 22:11:...|                  null|    null|
|                null|  2016-03-24 22:12:...|0.015625|
|                null|  2016-03-19 15:54:...|   5.375|
|2016-03-19 15:55:...|  2016-03-19 15:55:...| 5.78125|
|2016-03-19 15:56:...|                  null|    null|
|2016-03-24 22:11:...|  2016-03-24 22:11:...| 0.15625|
+--------------------+----------------------+--------+
我需要这个:

+--------------------+----------------------+--------+
|    timestamp_merged|toAppendData_timestamp|   value|
+--------------------+----------------------+--------+
|2016-03-24 22:11:...|                  null|    null|
|2016-03-24 22:12:...|  2016-03-24 22:12:...|0.015625|
|2016-03-19 15:54:...|  2016-03-19 15:54:...|   5.375|
|2016-03-19 15:55:...|  2016-03-19 15:55:...| 5.78125|
|2016-03-19 15:56:...|                  null|    null|
|2016-03-24 22:11:...|  2016-03-24 22:11:...| 0.15625|
+--------------------+----------------------+--------+
我尝试过这个,但没有成功:

appendedData = appendedData['timestamp'].fillna(appendedData['toAppendData_timestamp'])

您正在寻找的函数是
合并
。您可以从
pyspark.sql.functions
导入它:

from pyspark.sql.functions import coalesce, col
和使用:

appendedData.withColumn(
    'timestamp_merged', 
    coalesce(col('timestamp'), col('toAppendData_timestamp'))
)