Python 3.x 在列中搜索值

Python 3.x 在列中搜索值,python-3.x,pyspark,pyspark-dataframes,Python 3.x,Pyspark,Pyspark Dataframes,我想搜索列是否包含值 import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import * import pandas as pd df_init = pd.DataFrame({'id':['1', '2'], 'val':[100, 200]}) spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate() m

我想搜索列是否包含值

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pandas as pd

df_init = pd.DataFrame({'id':['1', '2'], 'val':[100, 200]})

spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate()


mySchema = StructType([ StructField("id", StringType(), True),
                        StructField("val", IntegerType(), True)])


df = spark.createDataFrame(df_init, schema=mySchema)


if df.filter(df.id == "3"):
    print('Yes')
else:
    print('No')
它总是打印“是”

在熊猫数据帧中,我将执行以下操作:

if '3' in df_init['id].values:
        print('Yes')
   else:
        print('No')```

but with pyspark I don't know how to handle this.
I tried using 'contains' , 'isin' but still the same.


您可以使用
collect\u list
以列表的形式获取“id”列中的所有值。然后检查您的元素是否在此列表中:

from pyspark.sql import functions as F

if '3' in df.select(F.collect_list('id')).first()[0]:
     print("Yes")
else:
     print('No')
或者只需在过滤操作后检查计数是否大于等于1:

if df.filter(df.id == "3").count() >= 1:
     print("Yes")
else:
     print('No')