Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala spark中按日期和小时列出的交叉表_Scala_Apache Spark - Fatal编程技术网

Scala spark中按日期和小时列出的交叉表

Scala spark中按日期和小时列出的交叉表,scala,apache-spark,Scala,Apache Spark,样本DF: var someDF = Seq( (1, "2017-12-02 03:04:00"), (1, "2017-12-02 03:45:00"), (1, "2017-12-02 04:04:00"), (2, "2017-12-02 04:14:00"), (2, "2017-12-02 04:54:00"), (3, "2017-10-01 11:45:20"), (4, "2017-10-01 02:45:20") ).toDF("number", "date") 作品: 当

样本DF:

var someDF = Seq(
(1, "2017-12-02 03:04:00"),
(1, "2017-12-02 03:45:00"),
(1, "2017-12-02 04:04:00"),
(2, "2017-12-02 04:14:00"),
(2, "2017-12-02 04:54:00"),
(3, "2017-10-01 11:45:20"),
(4, "2017-10-01 02:45:20")
).toDF("number", "date")
作品:

当我尝试使用交叉表时:

var temp = someDF.stat.crosstab("date","number")
temp.show()
作品:

我想应用相同的交叉表,但仅使用日期和时间,例如:2017-12-02 03:

预期OP:

+-------------------+---+---+---+---+
|   date_Hour_number|  1|  2|  3|  4|
+-------------------+---+---+---+---+
|2017-10-01 11      |  0|  0|  1|  0|
|2017-12-02 03 .    |  1|  0|  0|  0|
|2017-12-02 04 .    |  0|  2|  0|  0|

任何建议都会有帮助

因为您的
日期
列是字符串类型,您只需在应用
交叉表之前使用
子字符串
日期
裁剪为
小时

someDF.
  withColumn("datehour", substring($"date", 0, 13)).
  stat.crosstab("datehour", "number").
  show
// +---------------+---+---+---+---+
// |datehour_number|  1|  2|  3|  4|
// +---------------+---+---+---+---+
// |  2017-10-01 02|  0|  0|  0|  1|
// |  2017-10-01 11|  0|  0|  1|  0|
// |  2017-12-02 04|  1|  2|  0|  0|
// |  2017-12-02 03|  2|  0|  0|  0|
// +---------------+---+---+---+---+
+-------------------+---+---+---+---+
|   date_Hour_number|  1|  2|  3|  4|
+-------------------+---+---+---+---+
|2017-10-01 11      |  0|  0|  1|  0|
|2017-12-02 03 .    |  1|  0|  0|  0|
|2017-12-02 04 .    |  0|  2|  0|  0|
someDF.
  withColumn("datehour", substring($"date", 0, 13)).
  stat.crosstab("datehour", "number").
  show
// +---------------+---+---+---+---+
// |datehour_number|  1|  2|  3|  4|
// +---------------+---+---+---+---+
// |  2017-10-01 02|  0|  0|  0|  1|
// |  2017-10-01 11|  0|  0|  1|  0|
// |  2017-12-02 04|  1|  2|  0|  0|
// |  2017-12-02 03|  2|  0|  0|  0|
// +---------------+---+---+---+---+