将时间范围与spark sql一起使用时,是否添加带大小写的标签?

将时间范围与spark sql一起使用时,是否添加带大小写的标签?,sql,apache-spark,case-when,Sql,Apache Spark,Case When,我有这个表,其中显示ID和时间戳。我想为每个时间戳范围添加标签 ID timestamp a 2020-01-16 08:55:50 b 2020-01-16 08:57:37 c 2020-01-16 09:00:13 d 2020-01-16 09:01:32 e 2020-01-16 09:03:32 f 2020-01-16 09:06:56 例如,从2020-01-16 08:55:50到202

我有这个表,其中显示ID和时间戳。我想为每个时间戳范围添加标签

ID          timestamp
a       2020-01-16 08:55:50
b       2020-01-16 08:57:37
c       2020-01-16 09:00:13
d       2020-01-16 09:01:32
e       2020-01-16 09:03:32
f       2020-01-16 09:06:56
例如,从2020-01-16 08:55:50到2020-01-16 09:00:13是X,从2020-01-16 09:01:32到2020-01-16 09:06:56是Y

我希望该表将显示:

ID        timestamp                type_flag
a       2020-01-16 08:55:50          X
b       2020-01-16 08:57:37          X
c       2020-01-16 09:00:13          X
d       2020-01-16 09:01:32          Y
e       2020-01-16 09:03:32          Y
f       2020-01-16 09:06:56          Y
g       2020-01-16 09:08:51          Z
h       2020-01-16 09:10:43          Z
i       2020-01-16 09:13:21          Z

到目前为止,我所尝试的:

SELECT *,
    CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X' 
         WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
         WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
    ELSE 'A' END AS type_flag
FROM table1;
但它给了我一个错误,说:

Error [22P02]: ERROR: invalid input syntax for integer: "2021-01-16 08:55:50"
  Position: 37
如何修复查询以获得所需的结果?我使用sparksql来实现这一点


谢谢。

我认为您的语法或转换方式有问题

//creating sample data
val df = Seq(("a","2020-01-16 08:55:50"),("b","2020-01-16 08:57:37"),("c","2020-01-16 09:00:13"),("d","2020-01-16 09:01:32"),("e","2020-01-16 09:03:32"),("f","2020-01-16 09:06:56")).toDF("ID","timestamp")
//changing the data type of the timestamp column from string to timestamp
import org.apache.spark.sql.types._
val df1 = df.withColumn("timestamp",$"timestamp".cast("TimeStamp"))
//creating a view so that I can query it using spark sql
df1.createOrReplaceTempView("timestamptest")
//case when statements inside the spark sql
val df3 = spark.sql("""select *, CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X' 
         WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
         WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
    ELSE 'A' END As type_flag from timestamptest""")
display(df3)
您可以看到如下输出:


尝试将时间字符串包装在
时间戳()中。
?比如时间戳('2020-01-16 08:55:50')和…之间的
,我试过了,但它也出现了一个错误,说
error[42601]:error:syntax error在“'2021-01-16 08:55:50'”位置或附近:50
你可能在其他地方有语法错误,比如缺少括号或逗号