在pyspark上连接1=1_Pyspark_Apache Spark Sql

在pyspark上连接1=1

pyspark

在pyspark上连接1=1,pyspark,apache-spark-sql,Pyspark,Apache Spark Sql,你好我希望检查我的变量是否存在于两个表中的任何一个表中，并在单个表中获得结果，以便进一步处理。我觉得很简单： ''' select concent, concent_big from (select count(*) as concent where core_id = "{}" ) as a left join (select count(*) as concent_big ,concent_2 where core_id = "{}" ) as b

你好

我希望检查我的变量是否存在于两个表中的任何一个表中，并在单个表中获得结果，以便进一步处理。我觉得很简单：

'''
   select concent, concent_big  from 
     (select count(*) as concent where core_id = "{}" ) as a
   left join 
     (select count(*) as concent_big ,concent_2 where core_id = "{}" ) as b
   on 1 = 1
'''

然而，这似乎是不允许的。这有点混乱，因为我以前在Sql中做过类似的事情。现在Pypark让我很难受。我想出了一个变通的办法，但这很愚蠢：

有没有办法让这更优雅

为什么不直接使用

交叉连接

？一句警告的话——这会产生一个完整的笛卡尔积，因此您的表可能会在大小上爆炸，但在您的情况下，这似乎是理想的效果。你可以在这里阅读：

编辑：使用Spark SQL时，该语言遵循ANSI SQL标准，因此该命令变为

交叉连接

希望这有帮助。

下面的内容对你有用吗？@napoleon\u borntoparty恐怕不行。Quote='mismatched input'crossJoin'如果您使用的是Spark SQL，请尝试交叉连接。现在可以使用了。我接受了你的回答。对于未来的读者，您是否介意编辑答案以包含此内容。谢谢

 '''
    select concent, concent_big  from 
      (select count(*) as concent, 1 as tmp_key from concent where core_id = "{}" ) as a
    left join 
      (select count(*) as concent_big , 1 as tmp_key from concent_2 where core_id = "{}" ) as b
    on a.tmp_key = b.tmp_key
 '''