在Scala中将多列写入单个函数的技术
下面是使用Spark Scala的两种方法,我试图找到,如果列包含一个字符串,然后求出现次数的和(1或0),是否有更好的方法将其写入单个函数中,从而避免每次添加新条件时编写方法。提前谢谢在Scala中将多列写入单个函数的技术,scala,apache-spark,Scala,Apache Spark,下面是使用Spark Scala的两种方法,我试图找到,如果列包含一个字符串,然后求出现次数的和(1或0),是否有更好的方法将其写入单个函数中,从而避免每次添加新条件时编写方法。提前谢谢 def sumFunctDays1cols(columnName: String, dayid: String, processday: String, fieldString: String, newColName: String): Column = { sum(when(('visit_start_ti
def sumFunctDays1cols(columnName: String, dayid: String, processday: String, fieldString: String, newColName: String): Column = {
sum(when(('visit_start_time > dayid).and('visit_start_time <= processday).and(lower(col(columnName)).contains(fieldString)), 1).otherwise(0)).alias(newColName) }
def sumFunctDays2cols(columnName: String, dayid: String, processday: String, fieldString1: String, fieldString2: String, newColName: String): Column = {
sum(when(('visit_start_time > dayid).and('visit_start_time <= processday).and(lower(col(columnName)).contains(fieldString1) || lower(col(columnName)).contains(fieldString2)), 1).otherwise(0)).alias(newColName) }
您可以执行以下操作(尚未测试)
希望这有帮助 您可以执行以下操作(尚未测试)
希望这有帮助 将函数的参数设置为列表,而不是String1、String2,将参数设置为字符串列表。 我为您实现了一个小示例:
import org.apache.spark.sql.functions.udf
val df = Seq(
(1, "mac"),
(2, "lenovo"),
(3, "hp"),
(4, "dell")).toDF("id", "brand")
// dictionary Set of words to check
val dict = Set("mac","leno","noname")
val checkerUdf = udf { (s: String) => dict.exists(s.contains(_) )}
df.withColumn("brand_check", checkerUdf($"brand")).show()
我希望这能解决你的问题。但是如果您需要帮助,请上传整个代码片段,我会帮您完成。将函数的参数设置为列表,而不是String1、String2,将参数设置为字符串列表。 我为您实现了一个小示例:
import org.apache.spark.sql.functions.udf
val df = Seq(
(1, "mac"),
(2, "lenovo"),
(3, "hp"),
(4, "dell")).toDF("id", "brand")
// dictionary Set of words to check
val dict = Set("mac","leno","noname")
val checkerUdf = udf { (s: String) => dict.exists(s.contains(_) )}
df.withColumn("brand_check", checkerUdf($"brand")).show()
我希望这能解决你的问题。但是如果您需要帮助,请上传整个代码片段,我会帮助您。我会试试这个,谢谢ShankarI会试试这个,谢谢Shankar
sumFunctDays2cols(
"columnName",
"2019-01-01",
"2019-01-10",
"prod_count",
col("lenovo"),col("prod_count")
)
import org.apache.spark.sql.functions.udf
val df = Seq(
(1, "mac"),
(2, "lenovo"),
(3, "hp"),
(4, "dell")).toDF("id", "brand")
// dictionary Set of words to check
val dict = Set("mac","leno","noname")
val checkerUdf = udf { (s: String) => dict.exists(s.contains(_) )}
df.withColumn("brand_check", checkerUdf($"brand")).show()