Scala 如何使用Map[String,Long]列作为DataFrame的标题并保留类型?
我有一个应用了Scala 如何使用Map[String,Long]列作为DataFrame的标题并保留类型?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个应用了过滤器条件的数据帧 val colNames = customerCountDF .filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth) 在所有选定的行中,我只需要一行的最后一列 最后一列类型是Map[String,Long]。我希望地图的所有键都是List[String] 我试过下面的语法 val colNames = customerCountDF .filter($"fis
过滤器
条件的数据帧
val colNames = customerCountDF
.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth)
在所有选定的行中,我只需要一行的最后一列
最后一列类型是Map[String,Long]
。我希望地图的所有键都是List[String]
我试过下面的语法
val colNames = customerCountDF
.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth)
.head
.getMap(14)
.keySet
.toList
.map(_.toString)
我正在使用map(u.toString)
将列表[无]
转换为列表[字符串]
。我得到的错误是:
missing parameter type for expanded function ((x$1) => x$1.toString)
[error] val colNames = customerCountDF.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth).head().getMap(14).keySet.toList.map(_.toString)
df
如下所示:
+-------------+-----+----------+-----------+------------+-------------+--------------------+--------------+--------+----------------+-----------+----------------+-------------+-------------+--------------------+
|division_name| low| call_type|fiscal_year|fiscal_month| region_name|abandon_rate_percent|answered_calls|connects|equiv_week_calls|equiv_weeks|equivalent_calls|num_customers|offered_calls| pv|
+-------------+-----+----------+-----------+------------+-------------+--------------------+--------------+--------+----------------+-----------+----------------+-------------+-------------+--------------------+
| NATIONAL|PHONE|CABLE CARD| 2016| 1|ALL DIVISIONS| 0.02| 10626| 0| 0.0| 0.0| 10649.8| 0| 10864|Map(subscribers_c...|
| NATIONAL|PHONE|CABLE CARD| 2016| 1| CENTRAL| 0.02| 3591| 0| 0.0| 0.0| 3598.6| 0| 3667|Map(subscribers_c...|
+-------------+-----+----------+-----------+------------+-------------+--------------------+--------------+--------+----------------+-----------+----------------+-------------+-------------+--------------------+
仅选择最后一列中的一行
[Map(subscribers_connects -> 5521287, disconnects_hsd -> 7992, subscribers_xfinity home -> 6277491, subscribers_bulk units -> 4978892, connects_cdv -> 41464, connects_disconnects -> 16945, connects_hsd -> 32908, disconnects_internet essentials -> 10319, disconnects_disconnects -> 3506, disconnects_video -> 8960, connects_xfinity home -> 43012)]
在应用筛选条件并从数据框中只取一行之后,我希望将最后一列的键作为
List[String]
获取。一种解决方法,以在List[String]中获得最终结果。看看这个:
scala> val customerCountDF=Seq((2018,12,Map("subscribers_connects" -> 5521287L, "disconnects_hsd" -> 7992L, "subscribers_xfinity home" -> 6277491L, "subscribers_bulk units" -> 4978892L, "connects_cdv" -> 41464L, "connects_disconnects" -> 16945L, "connects_hsd" -> 32908L, "disconnects_internet essentials" -> 10319L, "disconnects_disconnects" -> 3506L, "disconnects_video" -> 8960L, "connects_xfinity home" -> 43012L))).toDF("fiscal_year","fiscal_month","mapc")
customerCountDF: org.apache.spark.sql.DataFrame = [fiscal_year: int, fiscal_month: int ... 1 more field]
scala> val maxYear =2018
maxYear: Int = 2018
scala> val maxMnth = 12
maxMnth: Int = 12
scala> val colNames = customerCountDF.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth).first.getMap(2).keySet.mkString(",").split(",").toList
colNames: List[String] = List(subscribers_connects, disconnects_hsd, subscribers_xfinity home, subscribers_bulk units, connects_cdv, connects_disconnects, connects_hsd, disconnects_internet essentials, disconnects_disconnects, disconnects_video, connects_xfinity home)
scala>
在
filter
之后,您只需选择列并获得如下所示的as-Map
first().getAs[Map[String, Long]]("pv").keySet
通过在源代码
getMap(14)
中显式指定类型参数,类型问题很容易解决。因为您知道您期望的是Map
的String->Int
键值对,只需将getMap(14)
替换为getMap[String,Int](14)
至于getMap[String,Int](14)
是空的Map
,这与您的数据有关,您只需在行的索引14
处有一个空映射
更多细节
在Scala中,当您创建列表[a]
时,Scala使用可用信息推断类型
比如说,
// Explicitly provide the type parameter info
scala> val l1: List[Int] = List(1, 2)
// l1: List[Int] = List(1, 2)
// Infer the type parameter by using the arguments passed to List constructor,
scala> val l2 = List(1, 2)
// l2: List[Int] = List(1, 2)
那么,当你创建一个空列表时会发生什么
// Explicitly provide the type parameter info
scala> val l1: List[Int] = List()
// l1: List[Int] = List()
// Infer the type parameter by using the arguments passed to List constructor,
// but surprise, there are no argument since you are creating empty list
scala> val l2 = List()
// l2: List[Nothing] = List()
因此,当Scala什么都不知道时,它会选择它能找到的最合适的类型,即“空”类型Nothing
当您对其他集合对象执行toList
时,也会发生同样的情况,它试图从源对象推断类型参数
scala> val ks1 = Map.empty[Int, Int].keySet
// ks1: scala.collection.immutable.Set[Int] = Set()
scala> val l1 = ks1.toList
// l1: List[Int] = List()
scala> val ks2 = Map.empty.keySet
// ks: scala.collection.immutable.Set[Nothing] = Set()
scala> val l2 = ks2.toList
// l1: List[Nothing] = List()
类似地,您在数据帧的标题行上调用的getMap(14)
,使用从索引处的行获取的值来推断映射的类型参数。因此,如果在所述索引处没有得到任何内容,则返回的映射将与map.empty
相同,后者是映射[无,无]
也就是说你的整个
val colNames = customerCountDF.filter($"fiscal_year" === maxYear && $"fiscal_month" === maxMnth).head.getMap(14).keySet.toList.map(_.toString)
相当于,
val colNames = Map.empty.keySet.toList.map(_.toString)
因此
scala> val l = List()
// l1: List[Nothing] = List()
val colNames = l.map(_.toString)
综上所述,任何列表[无]
只能是一个空列表
现在,有两个问题,一个是关于列表[Nothing]
中的类型问题,另一个是关于它是空的。既然您只访问一个列(位于第14位),为什么不让开发人员的生活更轻松一点(并帮助以后支持您的代码的人)
请尝试以下操作:
val colNames = customerCountDF
.where($"fiscal_year" === maxYear) // Split one long filter into two
.where($"fiscal_month" === maxMnth) // where is a SQL-like alias of filter
.select("pv") // Take just the field you need to work with
.as[Map[String, Long]] // Map it to the proper type
.head // Load just the single field (all others are left aside)
.keySet // That's just a pure Scala
我认为上面的代码以如此清晰的方式说明了它的功能(我认为应该是提供的解决方案中最快的,因为它只将单个pv
字段加载到驱动程序上的JVM对象)