Dataframe 将日期强制转换为整数pyspark
在pyspark数据帧中是否可以将日期列转换为整数列?我尝试了两种不同的方法,但每次尝试都会返回一个带null的列。我错过了什么Dataframe 将日期强制转换为整数pyspark,dataframe,date,apache-spark,pyspark,casting,Dataframe,Date,Apache Spark,Pyspark,Casting,在pyspark数据帧中是否可以将日期列转换为整数列?我尝试了两种不同的方法,但每次尝试都会返回一个带null的列。我错过了什么 from pyspark.sql.types import * # DUMMY DATA simpleData = [("James",34,"2006-01-01","true","M",3000.60), ("Michael",33,"1980-
from pyspark.sql.types import *
# DUMMY DATA
simpleData = [("James",34,"2006-01-01","true","M",3000.60),
("Michael",33,"1980-01-10","true","F",3300.80),
("Robert",37,"1992-07-01","false","M",5000.50)
]
columns = ["firstname","age","jobStartDate","isGraduated","gender","salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)
df=df.withColumn("jobStartDate", df['jobStartDate'].cast(DateType()))
# ATTEMPT 1 with cast()
df=df.withColumn("jobStartDateAsInteger1", df['jobStartDate'].cast(IntegerType()))
# ATTEMPT 2 with selectExpr()
df=df.selectExpr("*","CAST(jobStartDate as int) as jobStartDateAsInteger2")
df.show()
您可以尝试使用
F.UNIX\u timestamp()
将其强制转换为UNIX时间戳:
很好,我只是添加了几个细节,以获得自1970-01-01以来的天数,而不是秒数,但这正是我所需要的。Tks!df=df.withColumn(“jobStartDateAsInteger1”,F.unix_时间戳(df['jobStartDate'])/(24*60*60));df=df.withColumn(“jobStartDateAsInteger1”,df['jobStartDateAsInteger1'].cast(IntegerType()))
from pyspark.sql.types import *
import pyspark.sql.functions as F
# DUMMY DATA
simpleData = [("James",34,"2006-01-01","true","M",3000.60),
("Michael",33,"1980-01-10","true","F",3300.80),
("Robert",37,"1992-07-01","false","M",5000.50)
]
columns = ["firstname","age","jobStartDate","isGraduated","gender","salary"]
df = spark.createDataFrame(data = simpleData, schema = columns)
df=df.withColumn("jobStartDate", df['jobStartDate'].cast(DateType()))
df=df.withColumn("jobStartDateAsInteger1", F.unix_timestamp(df['jobStartDate']))
df.show()
+---------+---+------------+-----------+------+------+----------------------+
|firstname|age|jobStartDate|isGraduated|gender|salary|jobStartDateAsInteger1|
+---------+---+------------+-----------+------+------+----------------------+
| James| 34| 2006-01-01| true| M|3000.6| 1136073600|
| Michael| 33| 1980-01-10| true| F|3300.8| 316310400|
| Robert| 37| 1992-07-01| false| M|5000.5| 709948800|
+---------+---+------------+-----------+------+------+----------------------+