Date Pyspark:202001和202053(yyyyww)截止日期为空
我有一个带有yearweek列的数据框,我想将其转换为日期。除了“202001”和“202053”周之外,我编写的代码似乎每周都有效,例如:Date Pyspark:202001和202053(yyyyww)截止日期为空,date,apache-spark,pyspark,apache-spark-sql,week-number,Date,Apache Spark,Pyspark,Apache Spark Sql,Week Number,我有一个带有yearweek列的数据框,我想将其转换为日期。除了“202001”和“202053”周之外,我编写的代码似乎每周都有效,例如: df = spark.createDataFrame([ (1, "202001"), (2, "202002"), (3, "202003"), (4, "202052"), (5, "202053") ], ['id', 'week_year']
df = spark.createDataFrame([
(1, "202001"),
(2, "202002"),
(3, "202003"),
(4, "202052"),
(5, "202053")
], ['id', 'week_year'])
df.withColumn("date", F.to_date(F.col("week_year"), "yyyyw")).show()
这几周我弄不清是什么错误,也弄不清如何修复。如何将202001周和202053周转换为有效日期?在Spark中处理ISO周确实是一个令人头痛的问题-事实上,Spark 3中不推荐(删除了?)此功能。我认为在UDF中使用Python datetime实用程序是一种更灵活的方法
import datetime
import pyspark.sql.functions as F
@F.udf('date')
def week_year_to_date(week_year):
# the '1' is for specifying the first day of the week
return datetime.datetime.strptime(week_year + '1', '%G%V%u')
df = spark.createDataFrame([
(1, "202001"),
(2, "202002"),
(3, "202003"),
(4, "202052"),
(5, "202053")
], ['id', 'week_year'])
df.withColumn("date", week_year_to_date('week_year')).show()
+---+---------+----------+
| id|week_year| date|
+---+---------+----------+
| 1| 202001|2019-12-30|
| 2| 202002|2020-01-06|
| 3| 202003|2020-01-13|
| 4| 202052|2020-12-21|
| 5| 202053|2020-12-28|
+---+---------+----------+
根据mck的回答,这是我最终用于Python 3.5.2版的解决方案:
import datetime
from dateutil.relativedelta import relativedelta
import pyspark.sql.functions as F
@F.udf('date')
def week_year_to_date(week_year):
# the '1' is for specifying the first day of the week
return datetime.datetime.strptime(week_year + '1', '%Y%W%w') - relativedelta(weeks = 1)
df = spark.createDataFrame([
(9, "201952"),
(1, "202001"),
(2, "202002"),
(3, "202003"),
(4, "202052"),
(5, "202053")
], ['id', 'week_year'])
df.withColumn("date", week_year_to_date('week_year')).show()
如果不使用在3.6中添加的“%G%V%u”,我必须从日期中减去一周才能得到正确的日期。谢谢,这对我来说很有帮助!因为这是我的首选方式,所以我会把你的答案记下来。不幸的是,我们在集群上运行的是Python 3.5.2版,所以我不得不回到一个更加丑陋的解决方案。我会在另一个答案中加上我的。