Apache spark 如何使用spark Dataframe创建一个几十年的窗口?

Apache spark 如何使用spark Dataframe创建一个几十年的窗口?,apache-spark,apache-spark-sql,spark-dataframe,Apache Spark,Apache Spark Sql,Spark Dataframe,样本数据集: 1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almod�var, Pedro;68;No;NicholasCage.png 1991;113;High Heels;Comedy;Bos�, Miguel;Abril, Victoria;Almod�var, Pedro;68;No;NicholasCage.png 1983;104;Dead Zone, The;Horror;Wa

样本数据集:

1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almod�var, Pedro;68;No;NicholasCage.png
1991;113;High Heels;Comedy;Bos�, Miguel;Abril, Victoria;Almod�var, Pedro;68;No;NicholasCage.png
1983;104;Dead Zone, The;Horror;Walken, Christopher;Adams, Brooke;Cronenberg, David;79;No;NicholasCage.png
1979;122;Cuba;Action;Connery, Sean;Adams, Brooke;Lester, Richard;6;No;seanConnery.png
1978;94;Days of Heaven;Drama;Gere, Richard;Adams, Brooke;Malick, Terrence;14;No;NicholasCage.png
1983;140;Octopussy;Action;Moore, Roger;Adams, Maud;Glen, John;68;No;NicholasCage.png
1984;101;Target Eagle;Action;Connors, Chuck;Adams, Maud;Loma, Jos� Antonio de la;14;No;NicholasCage.png
1989;99;American Angels: Baptism of Blood, The;Drama;Bergen, Robert D.;Adams, Trudy;Sebastian, Beverly;28;No;NicholasCage.png
问:这里我有一个“年份”栏,使用这个栏我想创建一个十年窗口,如1990-20002000-2010等。我知道有一个窗口函数可用于数据帧,但我不确定如何创建一个十年(十年)窗口作为不同的存储桶

窗口功能供参考:


注意:寻找基于Scala的解决方案此链接可能会对您有所帮助

您可以创建
WindowSpec
对象,并将其传递给
range/rowsBetween
函数

我有一个演示,但不同的例子。下面是:

transactions.withColumn("column", transactions.col("cardNumber").over(Window.rowsBetween(x, y)))

在深入研究之后,我能够通过以下转换将数据集拆分为十个不同的存储桶

//Deriving new column named "Date" based on "year" column. This step is required to bucket the data set into decade wise buckets 
val MovieDFwithDate=SortedByYear.withColumn("Date",format_string(("01-01-%d"),$"year"))

//Casting string version of date to standard DATE object
val MovieDFwithDateFormat = MovieDFwithDate.withColumn("Date",to_date($"Date","MM-dd-yyyy"))

//Windowing the data set into decade buckets - 365 days * 10 years = 3650 days
val windowedDF = MovieDFwithDateFormat.select($"*",window($"Date","3650 days","3650 days"))

感谢分享,但年份列在同一年中有多个值,因此(1,10)之间的行不会产生十年范围,如果我使用范围,我也必须硬编码年份范围,硬编码将不适用于其他年份范围。年份范围的硬编码可以用函数调用代替。不是吗?可能正在使用模式匹配案例?我找到了另一种方法,发现解决我的用例更简单。