Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如果spark sql中没有公共列值,我们如何连接这两个表_Sql_Apache Spark_Apache Spark Sql - Fatal编程技术网

如果spark sql中没有公共列值,我们如何连接这两个表

如果spark sql中没有公共列值,我们如何连接这两个表,sql,apache-spark,apache-spark-sql,Sql,Apache Spark,Apache Spark Sql,我有两张表格,分别是公民身份和国家: 我必须以以下方式连接上述表格: 如果公民身份表中countryCode前面有+符号,那么它应该与countryCode的国家表连接,并给出记录数 如果公民身份表中的countryCode前面有-sign,则它应该与country表连接并返回除带-sign的国家之外的记录 如果有空字符串或null,则它应该与country表中的所有countryCode联接 如果CountryCode开头的+或-表示不存在-或+的情况,则可以执行如下条件连接: select

我有两张表格,分别是公民身份和国家:

我必须以以下方式连接上述表格:

如果公民身份表中countryCode前面有+符号,那么它应该与countryCode的国家表连接,并给出记录数

如果公民身份表中的countryCode前面有-sign,则它应该与country表连接并返回除带-sign的国家之外的记录

如果有空字符串或null,则它应该与country表中的所有countryCode联接

如果CountryCode开头的+或-表示不存在-或+的情况,则可以执行如下条件连接:

select t.citizenshipId, t.CountryCode, 
  count(*) counter
from citizenship t inner join country c
on 1 = case left(t.CountryCode, 1)
  when '+' then concat(',', replace(t.CountryCode, '+', ''), ',') like concat('%,', c.countryCode, ',%')
  when '-' then concat(',', replace(t.CountryCode, '-', ''), ',') not like concat('%,', c.countryCode, ',%')
  else 1
end
group by t.citizenshipId, t.CountryCode
order by t.citizenshipId
看。 结果:


尝试执行此处建议的操作: 这是一个SQL Server答案,但可能有相似之处

首先,您需要清理公民身份表:

SELECT
citizenshipID, 
value AS CountryCode2
FROM 
citizenship
CROSS APPLY STRING_SPLIT(CountryCode,',')
这应返回:

1+英寸 1+US 2. 等等

现在我们需要连接和案例

选择CountryCode2, 如果CountryCode2为空,则为“Nothing” 当CountryCode2=则“无” 否则将CountryCode2,1作为符号

如果CountryCode2为空,则为“Nothing” 当CountryCode2=则“无” 否则,将CountryCode2,2作为代码, 从…起 选择 公民身份证, 值为CountryCode2 从…起 公民身份 交叉应用字符串\u SPLITCountryCode,,'

这为我们提供了清理后的查询,然后我们可以将其加入到Country表中

SELECT
Code, 
Description,
Sign,
SUM(Counter)
    FROM(
        SELECT A.Code AS Code, 
        B.CountryDescription AS Description,
        A.Sign AS Sign,
        '1' AS Counter
        FROM
            (SELECT CountryCode2,
            CASE WHEN CountryCode2 IS NULL THEN 'Nothing'
            WHEN CountryCode2='' THEN 'Nothing'
            ELSE LEFT(CountryCode2, 1) AS Sign,

            CASE WHEN CountryCode2 IS NULL THEN 'Nothing'
            WHEN CountryCode2='' THEN 'Nothing'
            ELSE RIGHT(CountryCode2,2) AS Code
            FROM
                (
                SELECT
                    citizenshipID, 
                    value AS CountryCode2
                FROM 
                    citizenship
                    CROSS APPLY STRING_SPLIT(CountryCode,','))) AS A
                JOIN country AS B ON A.Code=B.CountryCode)
GROUP BY Code, Description, Sign
我还没有测试过,但这可能会让你达到你需要的程度。请注意,我不太明白您想要的空字符串值是什么,所以请将您的逻辑插入其中

将条件连接与when/other一起使用

如果公民身份表中的countryCode列以+开头,则join on citizensity.countryCode包含country.countryCode。如果以-开头,则加入citizensity.countryCode不包含country.countryCode。如果为null或为空,请与国家/地区中的所有行连接

以下是使用Spark/Scala的示例:

val countryDF = Seq(
("IN", "INDIA"), ("US", "USA"), ("UK", "United kingdom")
).toDF("countryCode", "countryDescription")

val citizenshipDF = Seq(
(1, "+IN,+US"), (2, "-IN,-UK"), (3, " "), (4, null)
).toDF("citizenshipId", "CountryCode")



val joinCondition = when(col("ct.countryCode").startsWith("+"), 
col("ct.countryCode").contains(col("c.countryCode"))).otherwise(
when(col("ct.countryCode").startsWith("-"), 
not(col("ct.countryCode").contains(concat(col("c.countryCode"))))
).otherwise(lit(true)))

countryDF.as("c").join(citizenshipDF.as("ct"), joinCondition).show()
输出:

+-----------+------------------+-------------+-----------+
|countryCode|countryDescription|citizenshipId|CountryCode|
+-----------+------------------+-------------+-----------+
|         IN|             INDIA|            1|    +IN,+US|
|         IN|             INDIA|            3|           |
|         IN|             INDIA|            4|       null|
|         US|               USA|            1|    +IN,+US|
|         US|               USA|            2|    -IN,-UK|
|         US|               USA|            3|           |
|         US|               USA|            4|       null|
|         UK|    United kingdom|            3|           |
|         UK|    United kingdom|            4|       null|
+-----------+------------------+-------------+-----------+

那么,只加入正数,还是加入所有负数

一个有喜欢和不喜欢的内在连接怎么样

SELECT *
FROM citizenship AS cish
JOIN country AS ctry
ON cish.CountryCode IS NULL
OR cish.CountryCode = ''
OR (cish.CountryCode LIKE '%+%' AND cish.CountryCode LIKE CONCAT('%+', ctry.CountryCode, '%'))
OR (cish.CountryCode NOT LIKE '%+%' AND  cish.CountryCode LIKE '%-%' AND  cish.CountryCode NOT LIKE CONCAT('%-', ctry.CountryCode,'%'))
ORDER BY cish.citizenshipId, ctry.CountryCode

还要指定预期结果。如果该值为:-IN,+UK,该怎么办?
val countryDF = Seq(
("IN", "INDIA"), ("US", "USA"), ("UK", "United kingdom")
).toDF("countryCode", "countryDescription")

val citizenshipDF = Seq(
(1, "+IN,+US"), (2, "-IN,-UK"), (3, " "), (4, null)
).toDF("citizenshipId", "CountryCode")



val joinCondition = when(col("ct.countryCode").startsWith("+"), 
col("ct.countryCode").contains(col("c.countryCode"))).otherwise(
when(col("ct.countryCode").startsWith("-"), 
not(col("ct.countryCode").contains(concat(col("c.countryCode"))))
).otherwise(lit(true)))

countryDF.as("c").join(citizenshipDF.as("ct"), joinCondition).show()
+-----------+------------------+-------------+-----------+
|countryCode|countryDescription|citizenshipId|CountryCode|
+-----------+------------------+-------------+-----------+
|         IN|             INDIA|            1|    +IN,+US|
|         IN|             INDIA|            3|           |
|         IN|             INDIA|            4|       null|
|         US|               USA|            1|    +IN,+US|
|         US|               USA|            2|    -IN,-UK|
|         US|               USA|            3|           |
|         US|               USA|            4|       null|
|         UK|    United kingdom|            3|           |
|         UK|    United kingdom|            4|       null|
+-----------+------------------+-------------+-----------+
SELECT *
FROM citizenship AS cish
JOIN country AS ctry
ON cish.CountryCode IS NULL
OR cish.CountryCode = ''
OR (cish.CountryCode LIKE '%+%' AND cish.CountryCode LIKE CONCAT('%+', ctry.CountryCode, '%'))
OR (cish.CountryCode NOT LIKE '%+%' AND  cish.CountryCode LIKE '%-%' AND  cish.CountryCode NOT LIKE CONCAT('%-', ctry.CountryCode,'%'))
ORDER BY cish.citizenshipId, ctry.CountryCode