Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/csharp-4.0/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在scala中找到组合计数?_Scala - Fatal编程技术网

如何在scala中找到组合计数?

如何在scala中找到组合计数?,scala,Scala,我的数据集包含5列,最后一列为classindex。我想要每个列与classindex值的组合 "sunny", "hot", "high", "false","no" "sunny", "hot", "high", "true","no" "overcast", "hot", "high", "false","yes" "rainy", "mild", "high", "false","yes" 我想要sunny&yes=0,sunny&no=2,cloud&yes=1,rainy&yes=

我的数据集包含5列,最后一列为classindex。我想要每个列与classindex值的组合

"sunny", "hot", "high", "false","no"
"sunny", "hot", "high", "true","no"
"overcast", "hot", "high", "false","yes"
"rainy", "mild", "high", "false","yes"

我想要sunny&yes=0,sunny&no=2,cloud&yes=1,rainy&yes=2的组合。

对数据集的描述对我来说似乎有点模糊,但是,您使用哪种数据结构来表示它

假设它是一个列表,您可以尝试以下操作:

l =>  (l.head, l.last)
将此应用于整个集合:

val dataset = List(
    "sunny"::"hot"::"high"::"no"::Nil,
    "sunny"::"hot"::"high"::"no"::Nil,
    "overcast"::"hot"::"high"::"yes"::Nil,
    "rainy"::"mild"::"high"::"yes"::Nil
)

val qualified = dataset.map(l => (l.head, l.last))
一旦使用“是”/“否”类对元素进行了限定,您就可以对事件进行分组,并计算每组元素的数量:

val countMap = qualified.groupBy(x => x).map(kv => (kv._1, kv._2.size))
或简称:

val countMap = qualified.groupBy(x => x).mapValues(_.size)
为了列出所有可能性,即使其计数为0,也可以生成所有可能的组合,并使用映射查找每个计数值:

(
    for(
        st <- dataset.map(_.head).toSet[String];
        q  <- dataset.map(_.last).toSet[String]
       ) yield (st,q)
).map(k => (k, countMap.getOrElse(k,0)))

> Set(((rainy,no),0), ((sunny,yes),0), ((sunny,no),2), ((rainy,yes),1), ((overcast,yes),1), ((overcast,no),0))
(
为了(
圣集((雨天,否),0),((晴天,是),0),((晴天,否),2),((雨天,是),1),((阴天,是),1),((阴天,否),0))

将每一行收集到一个包含5个属性的案例类
Weather

case class Weather(p1: String, p2: String, p3: String, p4: String, p5: String)
对我来说也是如此

val xs = Array(
  Weather("sunny", "hot", "high", "false","no"),
  Weather("sunny", "hot", "high", "true","no"),
  Weather("overcast", "hot", "high", "false","yes"),
  Weather("rainy", "mild", "high", "false","yes"))
按第一个和最后一个属性对条目进行分组,然后计算分组实例的数量,例如

xs.groupBy( w => (w.p1,w.p5) ).mapValues(_.size)
哪一个

Map((overcast,yes) -> 1, (sunny,no) -> 2, (rainy,yes) -> 1)

但是,这种方法不能解释
缺失的
或未声明的组,例如
“sunny”
“yes”

我更改了我的数据集。因此最后一列是classindex列。我希望在RDD上的每一列具有classindex组合的不同值与Count相同,而不是一个列表,我们如何使combination@AkhilaV,我假设这是课程工作,因为其他人以前也问过类似的问题。请先搜索其他答案-I fou和很多人在一起