Python 3.x 在列中搜索值
我想搜索列是否包含值Python 3.x 在列中搜索值,python-3.x,pyspark,pyspark-dataframes,Python 3.x,Pyspark,Pyspark Dataframes,我想搜索列是否包含值 import pyspark from pyspark.sql import SparkSession from pyspark.sql.types import * import pandas as pd df_init = pd.DataFrame({'id':['1', '2'], 'val':[100, 200]}) spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate() m
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pandas as pd
df_init = pd.DataFrame({'id':['1', '2'], 'val':[100, 200]})
spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate()
mySchema = StructType([ StructField("id", StringType(), True),
StructField("val", IntegerType(), True)])
df = spark.createDataFrame(df_init, schema=mySchema)
if df.filter(df.id == "3"):
print('Yes')
else:
print('No')
它总是打印“是”
在熊猫数据帧中,我将执行以下操作:
if '3' in df_init['id].values:
print('Yes')
else:
print('No')```
but with pyspark I don't know how to handle this.
I tried using 'contains' , 'isin' but still the same.
您可以使用
collect\u list
以列表的形式获取“id”列中的所有值。然后检查您的元素是否在此列表中:
from pyspark.sql import functions as F
if '3' in df.select(F.collect_list('id')).first()[0]:
print("Yes")
else:
print('No')
或者只需在过滤操作后检查计数是否大于等于1:
if df.filter(df.id == "3").count() >= 1:
print("Yes")
else:
print('No')