Dataframe 将日期强制转换为整数pyspark

Dataframe 将日期强制转换为整数pyspark,dataframe,date,apache-spark,pyspark,casting,Dataframe,Date,Apache Spark,Pyspark,Casting,在pyspark数据帧中是否可以将日期列转换为整数列?我尝试了两种不同的方法,但每次尝试都会返回一个带null的列。我错过了什么 from pyspark.sql.types import * # DUMMY DATA simpleData = [("James",34,"2006-01-01","true","M",3000.60), ("Michael",33,"1980-

在pyspark数据帧中是否可以将日期列转换为整数列?我尝试了两种不同的方法,但每次尝试都会返回一个带null的列。我错过了什么

from pyspark.sql.types import *

# DUMMY DATA
simpleData = [("James",34,"2006-01-01","true","M",3000.60),
    ("Michael",33,"1980-01-10","true","F",3300.80),
    ("Robert",37,"1992-07-01","false","M",5000.50)
  ]

columns = ["firstname","age","jobStartDate","isGraduated","gender","salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)
df=df.withColumn("jobStartDate", df['jobStartDate'].cast(DateType()))

# ATTEMPT 1 with cast()

df=df.withColumn("jobStartDateAsInteger1", df['jobStartDate'].cast(IntegerType()))

# ATTEMPT 2 with selectExpr()

df=df.selectExpr("*","CAST(jobStartDate as int) as jobStartDateAsInteger2")
df.show()

您可以尝试使用
F.UNIX\u timestamp()
将其强制转换为UNIX时间戳:


很好,我只是添加了几个细节,以获得自1970-01-01以来的天数,而不是秒数,但这正是我所需要的。Tks!df=df.withColumn(“jobStartDateAsInteger1”,F.unix_时间戳(df['jobStartDate'])/(24*60*60));df=df.withColumn(“jobStartDateAsInteger1”,df['jobStartDateAsInteger1'].cast(IntegerType()))
from pyspark.sql.types import *
import pyspark.sql.functions as F

# DUMMY DATA
simpleData = [("James",34,"2006-01-01","true","M",3000.60),
    ("Michael",33,"1980-01-10","true","F",3300.80),
    ("Robert",37,"1992-07-01","false","M",5000.50)
  ]

columns = ["firstname","age","jobStartDate","isGraduated","gender","salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)
df=df.withColumn("jobStartDate", df['jobStartDate'].cast(DateType()))

df=df.withColumn("jobStartDateAsInteger1", F.unix_timestamp(df['jobStartDate']))
df.show()

+---------+---+------------+-----------+------+------+----------------------+
|firstname|age|jobStartDate|isGraduated|gender|salary|jobStartDateAsInteger1|
+---------+---+------------+-----------+------+------+----------------------+
|    James| 34|  2006-01-01|       true|     M|3000.6|            1136073600|
|  Michael| 33|  1980-01-10|       true|     F|3300.8|             316310400|
|   Robert| 37|  1992-07-01|      false|     M|5000.5|             709948800|
+---------+---+------------+-----------+------+------+----------------------+