Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python PySpark数据帧:长格式到宽格式_Python_Apache Spark_Pyspark_Spark Dataframe - Fatal编程技术网

Python PySpark数据帧:长格式到宽格式

Python PySpark数据帧:长格式到宽格式,python,apache-spark,pyspark,spark-dataframe,Python,Apache Spark,Pyspark,Spark Dataframe,我有一份购买物品的客户名单: rdd = sc.parallelize([('A','Item1'), ('A','Item3'), ('B','Item1'), ('B','Item2')]) df=rdd.toDF(['Person','Item']) df.show() +------+-----+ |Person| Item| +------+-----+ | A|Item1| | A|Item3| | B|Item1| | B|Item2| +-----

我有一份购买物品的客户名单:

rdd = sc.parallelize([('A','Item1'), ('A','Item3'), ('B','Item1'), ('B','Item2')])
df=rdd.toDF(['Person','Item'])
df.show()
+------+-----+
|Person| Item|
+------+-----+
|     A|Item1|
|     A|Item3|
|     B|Item1|
|     B|Item2|
+------+-----+
现在我想使用pyspark将其更改为宽格式。结果应该如下所示:

+------+-----+-----+-----+
|Person|Item1|Item2|Item3|
+------+-----+-----+-----+
|     A|  1  |  0  |  0  |
|     A|  0  |  0  |  1  |
|     B|  1  |  0  |  0  |
|     B|  0  |  1  |  0  |
+------+-----+-----+-----+
你知道怎么做吗

致以最良好的祝愿,
Felix

我实际上找到了解决方案:

>>> df.crosstab('Person', 'Item').show()
+-----------+-----+-----+-----+
|Person_Item|Item1|Item2|Item3|
+-----------+-----+-----+-----+
|          A|    1|    0|    1|
|          B|    1|    1|    0|
+-----------+-----+-----+-----+