Scala Spark:为范围创建列的最佳方法
我正在寻找一种改进和减少重复代码行的方法。我需要为每个范围设置一列。有没有办法创建一个方法来处理这里的列生成Scala Spark:为范围创建列的最佳方法,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我正在寻找一种改进和减少重复代码行的方法。我需要为每个范围设置一列。有没有办法创建一个方法来处理这里的列生成 df .withColumn( "daysbetween", fn.datediff($"date1", $"date2") ) .withColumn( "0", fn.when( $"daysbetween" >= -30, $"TotalPrice" ) ) .withColumn(
df
.withColumn(
"daysbetween",
fn.datediff($"date1", $"date2")
)
.withColumn(
"0",
fn.when(
$"daysbetween" >= -30,
$"TotalPrice"
)
)
.withColumn(
"-30",
fn.when(
$"daysbetween".between(-60, -31),
$"TotalPrice"
)
)
.withColumn(
"-60",
fn.when(
$"daysbetween".between(-90, -61),
$"TotalPrice"
)
)
.withColumn(
"<-90",
fn.when(
$"daysbetween" < -90,
$"TotalPrice"
)
)
df
.withColumn(
“两天之间”,
fn.datediff($“date1”,“date2”)
)
.withColumn(
"0",
fn.何时(
$“daysbetween”>=-30,
$“总价”
)
)
.withColumn(
"-30",
fn.何时(
$“中间日”。介于(-60,-31)之间,
$“总价”
)
)
.withColumn(
"-60",
fn.何时(
$“中间日”。介于(-90,-61)之间,
$“总价”
)
)
.withColumn(
“我希望这段代码有帮助
我已经创建了一个列列表,用名称对其进行元组化。不确定是否要自动生成列生成逻辑
val dynamicCols : List[(String, Column)] = List(
("daysbetween",
fn.datediff($"date1", $"date2")
),
("0", fn.when(
$"daysbetween" >= -30,
$"TotalPrice"
)),
("-30", fn.when(
$"daysbetween" >= -30,
$"TotalPrice"
)),
("-30",
fn.when(
$"daysbetween".between(-60, -31),
$"TotalPrice"
)),
("-60",
$"".when(
$"daysbetween".between(-90, -61),
$"TotalPrice"
)),
("<-90",
fn.when(
$"daysbetween" < -90,
$"TotalPrice"
))
)
val df = dynamicCols.foldLeft(df)((df : DataFrame,colInfo : (String,Column)) => {
df.withColumn(colInfo._1,colInfo._2)
}
)
val dynamicCols:List[(字符串,列)]=List(
(“两天之间”,
fn.datediff($“date1”,“date2”)
),
(“0”,fn.when(
$“daysbetween”>=-30,
$“总价”
)),
(“-30”,fn.when(
$“daysbetween”>=-30,
$“总价”
)),
("-30",
fn.何时(
$“中间日”。介于(-60,-31)之间,
$“总价”
)),
("-60",
美元。什么时候(
$“中间日”。介于(-90,-61)之间,
$“总价”
)),
(“如果您只有上述情况,选择就足够了df.withColumn(“daysbetween”,fn.datediff($“date1”,“$“date2”))。选择($“daysbetween”,fn.when($“daysbetween”>=-30,$“TotalPrice”)。as(“0”),fn.when($“daysbetween”)。介于(-60,-31),$“TotalPrice”)。as($“daysbetween”)。as(“-30”),fn.when($“daysbetween”)。介于(-90,--61),$“TotalPrice”).as(“-60”),)
。如果不是这种情况,并且您有更多的情况,则需要基于o for循环的机制来填充columns@AlexandrosBiratsis谢谢alexandros,我认为你的解决方案比我的好!