Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 以一个热向量的形式形成新列_Apache Spark_Dataframe - Fatal编程技术网

Apache spark 以一个热向量的形式形成新列

Apache spark 以一个热向量的形式形成新列,apache-spark,dataframe,Apache Spark,Dataframe,我有一个数据帧: customer | Department ---------------------- A | Food B | Home A | Office C | Home A | Home B | Office 客户列和部门列都是字符串类型 如何将不同类型的部门转换为新列(如一个热向量),以便创建如下所示的新数据框: customer | Food | Home | Off

我有一个数据帧:

customer | Department
----------------------
A        |   Food
B        |   Home
A        |   Office
C        |   Home
A        |   Home
B        |   Office
客户列和部门列都是字符串类型

如何将不同类型的部门转换为新列(如一个热向量),以便创建如下所示的新数据框:

 customer | Food | Home | Office
-----------------------------------
    A        1     1      1
    B        0     1      1
    C        0     1      0

这里的
Food
Home
Office
列为整数类型,
customer
列为
String
类型。

您只需
类别
透视
对数据进行分组,聚合为

val df = Seq(
  ("A", "Food"),
  ("B", "Home"),  
  ("A", "Office"),
  ("C", "Home"),
  ("A", "Home"),
  ("B", "Office")
).toDF("customer", "department")


df.groupBy("customer").pivot("department").agg(count("department"))
    .na.fill(0)
输出:

+--------+----+----+------+
|customer|Food|Home|Office|
+--------+----+----+------+
|B       |0   |1   |1     |
|C       |0   |1   |0     |
|A       |1   |1   |1     |
+--------+----+----+------+

您只需按
类别对数据进行
分组
透视
,聚合为

val df = Seq(
  ("A", "Food"),
  ("B", "Home"),  
  ("A", "Office"),
  ("C", "Home"),
  ("A", "Home"),
  ("B", "Office")
).toDF("customer", "department")


df.groupBy("customer").pivot("department").agg(count("department"))
    .na.fill(0)
输出:

+--------+----+----+------+
|customer|Food|Home|Office|
+--------+----+----+------+
|B       |0   |1   |1     |
|C       |0   |1   |0     |
|A       |1   |1   |1     |
+--------+----+----+------+

这回答了你的问题吗?如果是这样,请接受作为答案,否则我相信你在进一步的问题上不会得到任何帮助。这是否回答了你的问题?如果是这样,请接受作为答案,否则我相信您将无法获得进一步问题的任何帮助。