Dataframe 什么'；s相当于熊猫'；PySpark中的s值_counts（）？_Dataframe_Count_Pyspark_Pandas Groupby

Dataframe 什么'；s相当于熊猫'；PySpark中的s值_counts（）？

dataframe pyspark

Dataframe 什么'；s相当于熊猫'；PySpark中的s值_counts（）？,dataframe,count,pyspark,pandas-groupby,Dataframe,Count,Pyspark,Pandas Groupby,我有以下python/pandas命令： df.groupby('Column_Name').agg(lambda x: x.value_counts().max() 其中，我获取DataFrameGroupBy对象中所有列的值计数如何在PySpark中执行此操作？大致相同： spark_df.groupBy('column_name').count().orderBy('count') 在groupBy中，可以有多个由，分隔的列例如groupBy（'column\u 1'、'colum

我有以下python/pandas命令：

df.groupby('Column_Name').agg(lambda x: x.value_counts().max()

其中，我获取

DataFrameGroupBy

对象中所有列的值计数

如何在PySpark中执行此操作？

大致相同：

spark_df.groupBy('column_name').count().orderBy('count')

在groupBy中，可以有多个由

，

分隔的列

例如

groupBy（'column\u 1'、'column\u 2'）

尝试以下方法：

spark_df.groupBy('column_name').count().show()

如果要控制订单，请尝试以下操作：

data.groupBy('col_name').count().orderBy('count', ascending=False).show()

嗨，Tanjin，谢谢你的回复！我没有得到同样的结果。我已经完成了以下操作：（Action-1）：从pyspark.sql.functions导入count exprs={x:“count”for x in df.columns}df.groupBy（“ID”）.agg（exprs）.show（5），这是可行的，但我正在获取每个组的所有记录计数。那不是我想要的。（操作-2）从pyspark.sql.functions导入countDistinct exprs=[countDistinct（x）for x in df.columns]df.groupBy（“ID”）.agg（*exprs）。显示（5）这会中断！！它的错误如下：ERROR client.TransportResponseHandler:需要添加到该行末尾以实际查看结果的缺少的

.show（）

，初学者可能会感到困惑。要匹配Pandas中的行为，您需要按降序返回count:

spark\u df.groupBy（'column\u name'）.count（）.orderBy（col（'count'））.desc（））.show（）

我要求的任务非常简单。我想通过dataframe获得group中所有列的值计数（最高的不同计数）。使用value_counts（）方法很容易做到这一点+----------++++++----------++++----------++++---------------++++----------++++++----------++++-----++++-----++++----------+++++-----+++-----++-----++----------++-----++-----++++++----------+-----++++++---------------+-----++++++++---------------+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 123 | BC01 | K|L|M | 213 | ID01 | 456 | 123 | BC01 | K|u L|M | 401 | ID01 | 456 123 | BC01 | P|Q|M | 213 | ID01 | 456 XYZ012 ABC |12月12日，英国政府在12月17日当天的12日当天当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天当天的12日当天的12日当天的12日当天的12日当天的12日当天的12日当天的当天的12日当天的当天的12日当天的12日当天的12日当天的当天的当天的12日当天的12日当天的当天的当天的当天的当天的当天的12日当天的当天的当天的12日当天的当天的当天的当天的当天的当天的12日当天的当天的12日当天的当天的12日当天的当天的当天的当天的当天的当天的12日的表演（5）++----------------++----------------++--------------++--------------++--------------++--------------+ID | count（ID）| count（COL4）| count（COL2）| count（COL3）| count（COL1）| count（COL5）|++---------++---------+---------+---------+----+----+----+----ID01 | 6 | 6 | 6 | 6 | 6 | ID02 | 4 | 4 | 4 |+----+----+----+----+----+----+----exprs=[countDistinct（x）对于schemaTrans.columns]schemaTrans.groupBy（“ID”）.agg（*exprs）.show（5）| ID |（DISTINCT COL1）|（DISTINCT COL2）|（DISTINCT COL3）|（DISTINCT COL4）|（DISTINCT COL5）|（DISTINCT ID）|+---------+----+----+----+----+----+----+----ID01 | 2 | 2 | 2 | 3 | 5 | 1 | ID02 | 2 | 2 | 4 | 3 | 1 |+----+---------------+---------------+---------------+---------------+---------------+---------请不要添加这些评论。把你的问题提出来。请同时阅读。也请查收。