Apache spark scala数据帧中的rlike给出了错误

Apache spark scala数据帧中的rlike给出了错误,apache-spark,apache-spark-sql,rlike,Apache Spark,Apache Spark Sql,Rlike,我试图将下面的配置单元SQL语句转换为Spark dataframe,但出现错误 case when (lower(message_txt) rlike '.*sampletext(\\s?is\\s?)newtext.*' ) then 'P' else 'Y' 示例数据:message\u txt=“这是新的示例文本,后跟newtext” 请帮助我提供等效的spark dataframe语句 在sql中的case语句末尾添加end 示例: val df=Seq(("This is new

我试图将下面的配置单元SQL语句转换为Spark dataframe,但出现错误

case when (lower(message_txt) rlike '.*sampletext(\\s?is\\s?)newtext.*' ) then 'P' else 'Y'
示例数据:
message\u txt=“这是新的示例文本,后跟newtext”


请帮助我提供等效的spark dataframe语句

在sql中的
case语句
末尾添加
end


示例:

val df=Seq(("This is new sampletext, followed by newtext")).toDF("message_txt")
df.createOrReplaceTempView("tmp")
spark.sql("select case when (lower(message_txt) rlike '.sampletext(\\s?is\\s?)newtext.' ) then 'P' else 'Y' end from tmp").show()

//Result
//+--------------------------------------------------------------------------------+
//|CASE WHEN lower(message_txt) RLIKE .sampletext(s?iss?)newtext. THEN P ELSE Y END|
//+--------------------------------------------------------------------------------+
//|                                                                               Y|
//+--------------------------------------------------------------------------------+
df.withColumn("status", when(lower(col("message_txt")).rlike(".sampletext(\\s?is\\s?)newtext."),"P").otherwise("Y")).show()

//Result
//+--------------------+------+
//|         message_txt|status|
//+--------------------+------+
//|This is new sampl...|     Y|
//+--------------------+------+
spark Sql中的

val df=Seq(("This is new sampletext, followed by newtext")).toDF("message_txt")
df.createOrReplaceTempView("tmp")
spark.sql("select case when (lower(message_txt) rlike '.sampletext(\\s?is\\s?)newtext.' ) then 'P' else 'Y' end from tmp").show()

//Result
//+--------------------------------------------------------------------------------+
//|CASE WHEN lower(message_txt) RLIKE .sampletext(s?iss?)newtext. THEN P ELSE Y END|
//+--------------------------------------------------------------------------------+
//|                                                                               Y|
//+--------------------------------------------------------------------------------+
df.withColumn("status", when(lower(col("message_txt")).rlike(".sampletext(\\s?is\\s?)newtext."),"P").otherwise("Y")).show()

//Result
//+--------------------+------+
//|         message_txt|status|
//+--------------------+------+
//|This is new sampl...|     Y|
//+--------------------+------+
数据帧API中的

val df=Seq(("This is new sampletext, followed by newtext")).toDF("message_txt")
df.createOrReplaceTempView("tmp")
spark.sql("select case when (lower(message_txt) rlike '.sampletext(\\s?is\\s?)newtext.' ) then 'P' else 'Y' end from tmp").show()

//Result
//+--------------------------------------------------------------------------------+
//|CASE WHEN lower(message_txt) RLIKE .sampletext(s?iss?)newtext. THEN P ELSE Y END|
//+--------------------------------------------------------------------------------+
//|                                                                               Y|
//+--------------------------------------------------------------------------------+
df.withColumn("status", when(lower(col("message_txt")).rlike(".sampletext(\\s?is\\s?)newtext."),"P").otherwise("Y")).show()

//Result
//+--------------------+------+
//|         message_txt|status|
//+--------------------+------+
//|This is new sampl...|     Y|
//+--------------------+------+

更新:

val df=Seq(("This is new sampletext, followed by newtext")).toDF("message_txt")
df.createOrReplaceTempView("tmp")
spark.sql("select case when (lower(message_txt) rlike '.sampletext(\\s?is\\s?)newtext.' ) then 'P' else 'Y' end from tmp").show()

//Result
//+--------------------------------------------------------------------------------+
//|CASE WHEN lower(message_txt) RLIKE .sampletext(s?iss?)newtext. THEN P ELSE Y END|
//+--------------------------------------------------------------------------------+
//|                                                                               Y|
//+--------------------------------------------------------------------------------+
df.withColumn("status", when(lower(col("message_txt")).rlike(".sampletext(\\s?is\\s?)newtext."),"P").otherwise("Y")).show()

//Result
//+--------------------+------+
//|         message_txt|status|
//+--------------------+------+
//|This is new sampl...|     Y|
//+--------------------+------+
正在检查message_txt列中的字符串sampletext和newtext

//using rlike
df.withColumn("status", when(lower(col("message_txt")).rlike("sampletext.*newtext"),"P").otherwise("Y")).show()

//using like
df.withColumn("status", when(lower(col("message_txt")).like("%sampletext%newtext%"),"P").otherwise("Y")).show()

//+--------------------+------+
//|         message_txt|status|
//+--------------------+------+
//|This is new sampl...|     P|
//+--------------------+------+
使用
when(lower($“value”).rlike(“.sampletext(\sis\s?)newtext.”),lit('p')。否则(“Y”)


您试图从消息中提取什么?消息_txt包含sampletext和newtext,因此它应将“P”显示为状态正确?您确定此
正则表达式在配置单元中工作并为上述文本生成“P”吗?这是一种正则表达式。如果消息_txt包含这些字符串,那么我将设置“P”@CNR,您可以使用
.rlike
.like
函数来处理这种情况,检查我的更新答案,因为我已经添加了这两个函数的工作示例。