Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pyspark:字符串列上的多个筛选器_Python_Pandas_Pyspark_Apache Spark Sql_Pyspark Sql - Fatal编程技术网

Python Pyspark:字符串列上的多个筛选器

Python Pyspark:字符串列上的多个筛选器,python,pandas,pyspark,apache-spark-sql,pyspark-sql,Python,Pandas,Pyspark,Apache Spark Sql,Pyspark Sql,假设下表是pyspark dataframe,我想对多个值的列ind应用过滤器。如何在pyspark中执行此操作 ind group people value John 1 5 100 Ram 1 2 2 John 1 10 80 Tom 2 20 40 Tom 1 7 10 Anil 2 23 30 我试着跟随,但没有成功 filter = ['John'

假设下表是pyspark dataframe,我想对多个值的列ind应用过滤器。如何在pyspark中执行此操作

ind group people value 
John  1    5    100   
Ram   1    2    2       
John  1    10   80    
Tom   2    20   40    
Tom   1    7    10    
Anil  2    23   30    
我试着跟随,但没有成功

filter = ['John', 'Ram']
filtered_df = df.filter("ind == filter ")
filtered_df.show()
如何在spark中实现这一点?

您可以使用:

filter = ['John', 'Ram']
filtered_df = df.filter("ind in ('John', 'Ram') ")
filtered_df.show()


如果您想在列表中包含过滤器。还请注意,我们使用单等号
=
而不是双等号
=
来测试pyspark中的相等性(如SQL中)

这与您想要的正好相反:-因此您知道需要在函数/运算符中使用
。此处:可能重复
filter = ['John', 'Ram']
processed_for_pyspark = ', '.join(['\'' + s + '\'' for s in filter])
filtered_df = df.filter("ind in ({}) ".format(processed_for_puspark))
filtered_df.show()