Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Spark/scala将字符串转换为数据帧_Scala_Apache Spark - Fatal编程技术网

使用Spark/scala将字符串转换为数据帧

使用Spark/scala将字符串转换为数据帧,scala,apache-spark,Scala,Apache Spark,我想将字符串列转换为时间戳列,但它总是返回空值 val t = unix_timestamp(col("tracking_time"),"MM/dd/yyyy").cast("timestamp") val df= df2.withColumn("ts", t) 有什么想法吗? 谢谢。确保您的字符串列与指定的格式相匹配MM/dd/yyyy 如果不匹配,则返回null 示例: val df2=Seq(("12/12/2020")).toDF("tracking_time") v

我想将字符串列转换为时间戳列,但它总是返回空值

  val t = unix_timestamp(col("tracking_time"),"MM/dd/yyyy").cast("timestamp")
   val df=   df2.withColumn("ts", t)
有什么想法吗?


谢谢。

确保您的
字符串列
与指定的格式相匹配
MM/dd/yyyy

  • 如果不匹配,则返回
    null
示例:

val df2=Seq(("12/12/2020")).toDF("tracking_time")
val t = unix_timestamp(col("tracking_time"),"MM/dd/yyyy").cast("timestamp")

df2.withColumn("ts", t).show()
//+-------------+-------------------+
//|tracking_time|                 ts|
//+-------------+-------------------+
//|   12/12/2020|2020-12-12 00:00:00|
//+-------------+-------------------+

df2.withColumn("ts",unix_timestamp(col("tracking_time"),"MM/dd/yyyy").cast("timestamp")).show()
//+-------------+-------------------+
//|tracking_time|                 ts|
//+-------------+-------------------+
//|   12/12/2020|2020-12-12 00:00:00|
//+-------------+-------------------+
//(or)  by using to_timestamp function.

df2.withColumn("ts",to_timestamp(col("tracking_time"),"MM/dd/yyyy")).show()
//+-------------+-------------------+
//|tracking_time|                 ts|
//+-------------+-------------------+
//|   12/12/2020|2020-12-12 00:00:00|
//+-------------+-------------------+

正如@Shu提到的,原因可能是
跟踪时间
列的格式无效。不过值得一提的是,Spark正在寻找模式作为列值的前缀。学习这些例子以获得更好的直觉

Seq(
  "03/29/2020 00:00",
  "03/29/2020",
  "00:00 03/29/2020",
  "03/29/2020somethingsomething"
).toDF("tracking_time")
  .withColumn("ts", unix_timestamp(col("tracking_time"), "MM/dd/yyyy").cast("timestamp"))
  .show()
//+--------------------+-------------------+
//|       tracking_time|                 ts|
//+--------------------+-------------------+
//|    03/29/2020 00:00|2020-03-29 00:00:00|
//|          03/29/2020|2020-03-29 00:00:00|
//|    00:00 03/29/2020|               null|
//|03/29/2020somethi...|2020-03-29 00:00:00|

是否可以为
跟踪时间
列添加样本数据?这是因为日期格式与指定格式不兼容。谢谢你,舒