Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/ssh/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何计算PySpark dataframe列中字符串的出现次数?_Pyspark - Fatal编程技术网

如何计算PySpark dataframe列中字符串的出现次数?

如何计算PySpark dataframe列中字符串的出现次数?,pyspark,Pyspark,假设我有以下PySpark数据帧: +---+------+-------+-----------------+ |age|height| name| friends | +---+------+-------+-----------------+ | 10| 80| Alice| 'Grace, Sarah'| | 15| null| Bob| 'Sarah'| | 12| null| Tom|'Amy, Sarah, Bob'

假设我有以下PySpark数据帧:

+---+------+-------+-----------------+
|age|height|   name|        friends  |
+---+------+-------+-----------------+
| 10|    80|  Alice|   'Grace, Sarah'|
| 15|  null|    Bob|          'Sarah'|
| 12|  null|    Tom|'Amy, Sarah, Bob'|
| 13|  null| Rachel|       'Tom, Bob'|
+---+------+-------+-----------------+
我如何计算有“Sarah”作为朋友而不创建其他专栏的人数


我尝试了
df.friends.apply(lambda x:x[x.str.contains('Sarah')].count())
但得到了
TypeError:“Column”对象不可调用

您可以尝试以下代码: df=df.withColumn('sarah',lit('sarah')) df.filter(df['friends'].contains(df['sarah']).count()

谢谢


您的语法是针对熊猫的。你在找:
df.where(df.friends.like('%Sarah%')).count()
?这正是我要找的!您不需要为此创建列@pault已经在评论中回答了上述问题
   df.where(df.friends.like('%Sarah%')).count()