Dataframe 将第一个数据帧值STARTS与第二个数据帧值中的任何一个进行检查

Dataframe 将第一个数据帧值STARTS与第二个数据帧值中的任何一个进行检查,dataframe,apache-spark,pyspark,apache-spark-sql,Dataframe,Apache Spark,Pyspark,Apache Spark Sql,我有两个pyspark数据帧,如下所示: df1 = spark.createDataFrame( ["yes","no","yes23", "no3", "35yes", """41no["maybe"]"""], "string" ).toDF("location&qu

我有两个pyspark数据帧,如下所示:

df1 = spark.createDataFrame(
    ["yes","no","yes23", "no3", "35yes", """41no["maybe"]"""],
    "string"
).toDF("location")

df2 = spark.createDataFrame(
    ["yes","no"],
    "string"
).toDF("location")
我想检查位置列中的值是否来自df1开始与位置列中的值是否来自df2,反之亦然

比如:

df1.select("location").startsWith(df2.location)
以下是我在这里期望的输出:

+-------------+
|     location|
+-------------+
|          yes|
|           no|
|        yes23|
|          no3|
+-------------+

在我看来,使用spark SQL最简单:

df1.createOrReplaceTempView('df1')
df2.createOrReplaceTempView('df2')
joined = spark.sql("""
    select df1.*
    from df1
    join df2
    on df1.location rlike '^' || df2.location
""")