Python Pyspark创建时间戳列_Python_Datetime_Pyspark

Python Pyspark创建时间戳列

python datetime pyspark

Python Pyspark创建时间戳列,python,datetime,pyspark,Python,Datetime,Pyspark,我正在使用spark 2.1.0。我无法在下面使用的代码片段中创建pyspark中的时间戳列。请帮忙 df=df.withColumn('Age',lit(datetime.now())) 我越来越断言错误：列应为列请帮助假设您的代码片段中有数据帧，并且希望所有行都有相同的时间戳 df=df.withColumn('Age',lit(datetime.now())) 让我创建一些虚拟数据帧 >>> dict = [{'name': 'Alice', 'age': 1},

我正在使用spark 2.1.0。我无法在下面使用的代码片段中创建pyspark中的时间戳列。请帮忙

df=df.withColumn('Age',lit(datetime.now()))

我越来越

断言错误：列应为列

请帮助

假设您的代码片段中有数据帧，并且希望所有行都有相同的时间戳

df=df.withColumn('Age',lit(datetime.now()))

让我创建一些虚拟数据帧

>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)

>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>

>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time                 |
+---+-----+---------------------+
|1  |Alice|2017-08-02 16:16:14.0|
|2  |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+

>>> new_df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)

dict=[{'name'：'Alice'，'age'：1}，{'name'：'reach'，'age'：2}] >>>df=spark.createDataFrame（dict） >>>导入时间 >>>导入日期时间 >>>timestamp=datetime.datetime.fromtimestamp（time.time（））.strftime（“%Y-%m-%d%H:%m:%S”） >>>类型（时间戳） >>>从pyspark.sql.functions导入lit、unix\u时间戳 >>>时间戳 '2017-08-02 16:16:14' >>>new_df=df.withColumn（'time'，unix_timestamp（lit（timestamp），'yyyy-MM-dd HH:MM:ss'）。强制转换（“timestamp”）） >>>新建测向显示（截断=假） +---+-----+---------------------+ |年龄|姓名|时间| +---+-----+---------------------+ |爱丽丝2017-08-02 16:16:14.0| |2 |再次| 2017-08-02 16:16:14.0| +---+-----+---------------------+ >>>新的_df.printSchema（）根 |--年龄：长（可空=真） |--名称：字符串（nullable=true） |--时间：时间戳（nullable=true）

对于2.1.0，我不确定，至少您可以：

from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())

希望有帮助

加上巴拉莱卡，如果像我这样的人只想添加日期，而不想添加时间，那么他可以按照下面的代码进行操作

from pyspark.sql import functions as F
df.withColumn('Age', F.current_date())

希望这有帮助

此解决方案不是最新的，在pyspark当前版本中不再有效。这应该是公认的答案