Dataframe 如何在pyspark数据帧中将时间戳转换为bigint

Dataframe 如何在pyspark数据帧中将时间戳转换为bigint,dataframe,pyspark,apache-spark-sql,Dataframe,Pyspark,Apache Spark Sql,我在spark环境中使用python,希望将数据帧coulmn从时间戳数据类型转换为bigint(UNIX时间戳)。列如下所示:(“yyyy-MM-dd hh:MM:ss.SSSSSS”) 我已经阅读并尝试了以下内容: from pyspark.sql.functions import from_unixtime, unix_timestamp from pyspark.sql.types import TimestampType df1 = df.select((from_unixtime(

我在spark环境中使用python,希望将数据帧coulmn从时间戳数据类型转换为bigint(UNIX时间戳)。列如下所示:
(“yyyy-MM-dd hh:MM:ss.SSSSSS”)

我已经阅读并尝试了以下内容:

from pyspark.sql.functions import from_unixtime, unix_timestamp
from pyspark.sql.types import TimestampType

df1 = df.select((from_unixtime(unix_timestamp(df.timestamp_col, "yyyy-MM-dd hh:mm:ss.SSSSSS"))).cast(TimestampType()).alias("unix_time_col"))
但是输出给出了相当空的值

+-------------+
|unix_time_col|
+-------------+
|         null|
|         null|
|         null|
我在spark-on-hadoop环境上使用python3.7和spark&hadoop版本:
spark-2.3.1-bin-hadoop2.7
on
google colaboratory
我一定错过了什么。求求你,有什么帮助吗

from pyspark.sql import SparkSession
from pyspark.sql.functions import unix_timestamp
from pyspark.sql.types import (DateType, StructType, StructField, StringType)

spark = SparkSession.builder.appName('abc').getOrCreate()

column_schema = StructType([StructField("timestamp_col", StringType())])
data = [['2014-06-04 10:09:13.334422'], ['2015-06-03 10:09:13.443322'], ['2015-08-03 10:09:13.232431']]

data_frame = spark.createDataFrame(data, schema=column_schema)

data_frame.withColumn("timestamp_col", data_frame['timestamp_col'].cast(DateType()))
data_frame = data_frame.withColumn('timestamp_col', unix_timestamp('timestamp_col'))
data_frame.show()
输出

请删除代码中的“.sssss”,然后在转换为unixtimestamp时,它将起作用,即,代替“yyy-MM-dd-hh:MM:ss.SSSSSS”,如下所示:


df1=df.select(unix_timestamp(df.timestamp_col,“yyyy-MM-dd hh:MM:ss”)

谢谢,但是,这是一个
IOT
数据项目,其中
nano differences
在时间上非常重要,所以在我的情况下,我真的需要
”。SSSSSS“
嗨,请参考下面stackoverflow链接的回复,您将得到解决方案
from pyspark.sql import SparkSession
from pyspark.sql.functions import unix_timestamp
from pyspark.sql.types import (DateType, StructType, StructField, StringType)

spark = SparkSession.builder.appName('abc').getOrCreate()

column_schema = StructType([StructField("timestamp_col", StringType())])
data = [['2014-06-04 10:09:13.334422'], ['2015-06-03 10:09:13.443322'], ['2015-08-03 10:09:13.232431']]

data_frame = spark.createDataFrame(data, schema=column_schema)

data_frame.withColumn("timestamp_col", data_frame['timestamp_col'].cast(DateType()))
data_frame = data_frame.withColumn('timestamp_col', unix_timestamp('timestamp_col'))
data_frame.show()
+-------------+
|timestamp_col|
+-------------+
|   1401894553|
|   1433344153|
|   1438614553|
+-------------+