如何从PySpark中的date列获取一周的第一个日期?

如何从PySpark中的date列获取一周的第一个日期?,pyspark,Pyspark,我的PySpark数据帧中有一个普通的时间戳列。我想从新列中的给定日期获取一周的开始日期。对于spark 2.2.0 可能重复的使用日期 from pyspark.sql.functions import weekofyear, year, to_date, concat, lit, col from pyspark.sql.session import SparkSession from pyspark.sql.types import TimestampType spark = Spark

我的PySpark数据帧中有一个普通的时间戳列。我想从新列中的给定日期获取一周的开始日期。

对于spark 2.2.0


可能重复的使用日期
from pyspark.sql.functions import weekofyear, year, to_date, concat, lit, col
from pyspark.sql.session import SparkSession
from pyspark.sql.types import TimestampType

spark = SparkSession.builder.getOrCreate()

spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
    .withColumn('timestamp', col('timestamp').astype(TimestampType())) \
    .withColumn('week', weekofyear('timestamp')) \
    .withColumn('year', year('timestamp')) \
    .withColumn('date_of_the_week', to_date(concat('week', lit('/'), 'year'), "w/yyyy")) \
    .show(truncate=False)

+-------------------+----+----+----------------+
|timestamp          |week|year|date_of_the_week|
+-------------------+----+----+----------------+
|2020-10-03 05:00:00|40  |2020|2020-09-27      |
+-------------------+----+----+----------------+
from pyspark.sql.functions import date_trunc, col
from pyspark.sql.session import SparkSession
from pyspark.sql.types import TimestampType

spark = SparkSession.builder.getOrCreate()

spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
    .withColumn('timestamp', col('timestamp').astype(TimestampType())) \
    .withColumn('date_of_the_week', date_trunc(timestamp='timestamp', format='week')) \
    .show(truncate=False)

+-------------------+-------------------+
|timestamp          |date_of_the_week   |
+-------------------+-------------------+
|2020-10-03 05:00:00|2020-09-28 00:00:00|
+-------------------+-------------------+