Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark:为范围创建列的最佳方法_Scala_Dataframe_Apache Spark - Fatal编程技术网

Scala Spark:为范围创建列的最佳方法

Scala Spark:为范围创建列的最佳方法,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我正在寻找一种改进和减少重复代码行的方法。我需要为每个范围设置一列。有没有办法创建一个方法来处理这里的列生成 df .withColumn( "daysbetween", fn.datediff($"date1", $"date2") ) .withColumn( "0", fn.when( $"daysbetween" >= -30, $"TotalPrice" ) ) .withColumn(

我正在寻找一种改进和减少重复代码行的方法。我需要为每个范围设置一列。有没有办法创建一个方法来处理这里的列生成

df
  .withColumn(
    "daysbetween",
    fn.datediff($"date1", $"date2")
  )
  .withColumn(
    "0",
    fn.when(
      $"daysbetween" >= -30,
      $"TotalPrice"
    )
  )
  .withColumn(
    "-30",
    fn.when(
      $"daysbetween".between(-60, -31),
      $"TotalPrice"
    )
  )
  .withColumn(
    "-60",
    fn.when(
      $"daysbetween".between(-90, -61),
      $"TotalPrice"
    )
  )
  .withColumn(
    "<-90",
    fn.when(
      $"daysbetween" < -90,
      $"TotalPrice"
    )
  )
df
.withColumn(
“两天之间”,
fn.datediff($“date1”,“date2”)
)
.withColumn(
"0",
fn.何时(
$“daysbetween”>=-30,
$“总价”
)
)
.withColumn(
"-30",
fn.何时(
$“中间日”。介于(-60,-31)之间,
$“总价”
)
)
.withColumn(
"-60",
fn.何时(
$“中间日”。介于(-90,-61)之间,
$“总价”
)
)
.withColumn(
“我希望这段代码有帮助
我已经创建了一个列列表,用名称对其进行元组化。不确定是否要自动生成列生成逻辑

val dynamicCols : List[(String, Column)] = List(
      ("daysbetween",
         fn.datediff($"date1", $"date2")
      ),
      ("0", fn.when(
      $"daysbetween" >= -30,
      $"TotalPrice"
    )),
      ("-30", fn.when(
        $"daysbetween" >= -30,
        $"TotalPrice"
      )),
      ("-30",
        fn.when(
          $"daysbetween".between(-60, -31),
          $"TotalPrice"
        )),
      ("-60",
        $"".when(
          $"daysbetween".between(-90, -61),
          $"TotalPrice"
        )),
      ("<-90",
        fn.when(
          $"daysbetween" < -90,
          $"TotalPrice"
        ))
    )

    val df  = dynamicCols.foldLeft(df)((df : DataFrame,colInfo : (String,Column)) => {
      df.withColumn(colInfo._1,colInfo._2)
    }
    )
val dynamicCols:List[(字符串,列)]=List(
(“两天之间”,
fn.datediff($“date1”,“date2”)
),
(“0”,fn.when(
$“daysbetween”>=-30,
$“总价”
)),
(“-30”,fn.when(
$“daysbetween”>=-30,
$“总价”
)),
("-30",
fn.何时(
$“中间日”。介于(-60,-31)之间,
$“总价”
)),
("-60",
美元。什么时候(
$“中间日”。介于(-90,-61)之间,
$“总价”
)),

(“如果您只有上述情况,选择就足够了
df.withColumn(“daysbetween”,fn.datediff($“date1”,“$“date2”))。选择($“daysbetween”,fn.when($“daysbetween”>=-30,$“TotalPrice”)。as(“0”),fn.when($“daysbetween”)。介于(-60,-31),$“TotalPrice”)。as($“daysbetween”)。as(“-30”),fn.when($“daysbetween”)。介于(-90,--61),$“TotalPrice”).as(“-60”),)
。如果不是这种情况,并且您有更多的情况,则需要基于o for循环的机制来填充columns@AlexandrosBiratsis谢谢alexandros,我认为你的解决方案比我的好!