Python 3.x 从pyspark中的数据帧访问计数值_Python 3.x_Pyspark

Python 3.x 从pyspark中的数据帧访问计数值

python-3.x pyspark

Python 3.x 从pyspark中的数据帧访问计数值,python-3.x,pyspark,Python 3.x,Pyspark,我希望你帮不上忙我有这个数据帧，我想选择，例如，预测的计数==4 Code: the_counts=df.select('prediction').groupby('prediction').count() the_counts.show() +----------+-----+ |prediction|count| +----------+-----+ | 1| 8| | 6| 14| | 5| 5| |

我希望你帮不上忙

我有这个数据帧，我想选择，例如，预测的计数==4

Code: 
the_counts=df.select('prediction').groupby('prediction').count()
the_counts.show()


+----------+-----+
|prediction|count|
+----------+-----+
|         1|    8|
|         6|   14|
|         5|    5|
|         4|    8|
|         8|    5|
|         0|    6|
+----------+-----+

所以，我可以把这个值赋给一个变量。因为这将在一个循环中运行许多迭代

我做到了这一点，但这是通过创建一个不同的数据帧，然后将该数据帧更改为一个数字

dfva = the_counts.select('count').filter(the_counts.prediction ==6)
dfva.show()


+-----+
|count|
+-----+
|   14|
+-----+

有没有一种方法可以不经过这么多步骤直接访问号码，或者是最有效的方法

这是Python3.x和spark 2.1

非常感谢

您可以使用first（）方法直接获取值

>>> dfva = the_counts.filter(the_counts['prediction'] == 6).first()['count']
>>> type(dfva)
<type 'int'>
>>> print(dfva)
14

dfva=the_counts.filter（the_counts['prediction']==6）。first（）['count'] >>>类型（dfva） >>>打印（dfva） 14 您可以使用first（）方法直接获取值

>>> dfva = the_counts.filter(the_counts['prediction'] == 6).first()['count']
>>> type(dfva)
<type 'int'>
>>> print(dfva)
14

dfva=the_counts.filter（the_counts['prediction']==6）。first（）['count'] >>>类型（dfva） >>>打印（dfva） 14

：D你的第一句话是：我希望你帮不上忙。好吧，明显的错误，这里的人总是能帮上忙：-）：D你的第一句话是：我希望你帮不了忙。好吧，明显的错误，这里的人总是能帮上忙：-）