Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Scala中将多列写入单个函数的技术_Scala_Apache Spark - Fatal编程技术网

在Scala中将多列写入单个函数的技术

在Scala中将多列写入单个函数的技术,scala,apache-spark,Scala,Apache Spark,下面是使用Spark Scala的两种方法,我试图找到,如果列包含一个字符串,然后求出现次数的和(1或0),是否有更好的方法将其写入单个函数中,从而避免每次添加新条件时编写方法。提前谢谢 def sumFunctDays1cols(columnName: String, dayid: String, processday: String, fieldString: String, newColName: String): Column = { sum(when(('visit_start_ti

下面是使用Spark Scala的两种方法,我试图找到,如果列包含一个字符串,然后求出现次数的和(1或0),是否有更好的方法将其写入单个函数中,从而避免每次添加新条件时编写方法。提前谢谢

 def sumFunctDays1cols(columnName: String, dayid: String, processday: String, fieldString: String, newColName: String): Column = {
sum(when(('visit_start_time > dayid).and('visit_start_time <= processday).and(lower(col(columnName)).contains(fieldString)), 1).otherwise(0)).alias(newColName) }


 def sumFunctDays2cols(columnName: String, dayid: String, processday: String, fieldString1: String, fieldString2: String, newColName: String): Column = {
sum(when(('visit_start_time > dayid).and('visit_start_time <= processday).and(lower(col(columnName)).contains(fieldString1) || lower(col(columnName)).contains(fieldString2)), 1).otherwise(0)).alias(newColName) }

您可以执行以下操作(尚未测试)


希望这有帮助

您可以执行以下操作(尚未测试)


希望这有帮助

将函数的参数设置为列表,而不是String1、String2,将参数设置为字符串列表。 我为您实现了一个小示例:

import org.apache.spark.sql.functions.udf

  val df = Seq(
    (1, "mac"),
    (2, "lenovo"),
    (3, "hp"),
    (4, "dell")).toDF("id", "brand")

  // dictionary Set of words to check

  val dict = Set("mac","leno","noname")

  val checkerUdf = udf { (s: String) => dict.exists(s.contains(_) )}

  df.withColumn("brand_check", checkerUdf($"brand")).show()

我希望这能解决你的问题。但是如果您需要帮助,请上传整个代码片段,我会帮您完成。

将函数的参数设置为列表,而不是String1、String2,将参数设置为字符串列表。 我为您实现了一个小示例:

import org.apache.spark.sql.functions.udf

  val df = Seq(
    (1, "mac"),
    (2, "lenovo"),
    (3, "hp"),
    (4, "dell")).toDF("id", "brand")

  // dictionary Set of words to check

  val dict = Set("mac","leno","noname")

  val checkerUdf = udf { (s: String) => dict.exists(s.contains(_) )}

  df.withColumn("brand_check", checkerUdf($"brand")).show()

我希望这能解决你的问题。但是如果您需要帮助,请上传整个代码片段,我会帮助您。

我会试试这个,谢谢ShankarI会试试这个,谢谢Shankar
sumFunctDays2cols(
  "columnName",
  "2019-01-01", 
  "2019-01-10",
  "prod_count",
  col("lenovo"),col("prod_count")
)
import org.apache.spark.sql.functions.udf

  val df = Seq(
    (1, "mac"),
    (2, "lenovo"),
    (3, "hp"),
    (4, "dell")).toDF("id", "brand")

  // dictionary Set of words to check

  val dict = Set("mac","leno","noname")

  val checkerUdf = udf { (s: String) => dict.exists(s.contains(_) )}

  df.withColumn("brand_check", checkerUdf($"brand")).show()