Sql 使用WHERE子句更新语句，该子句包含具有空值的列_Sql_Postgresql_Sql Update

Sql 使用WHERE子句更新语句，该子句包含具有空值的列

sql postgresql

Sql 使用WHERE子句更新语句，该子句包含具有空值的列,sql,postgresql,sql-update,Sql,Postgresql,Sql Update,我正在使用另一个表中的数据更新一个表中的列。WHERE子句基于多个列，其中一些列为空。在我看来，这个空值是抛出的标准更新表集X=Y，其中A=B语句请参阅我试图根据表2中的数据更新表1的两个表中的一个。我的查询当前如下所示： UPDATE table_one SET table_one.x = table_two.y FROM table_two WHERE table_one.invoice_number = table_two.invoice_number AND table_one.

我正在使用另一个表中的数据更新一个表中的列。

WHERE

子句基于多个列，其中一些列为空。在我看来，这个空值是

抛出的标准更新表集X=Y，其中A=B
语句
请参阅我试图根据表2
中的数据更新表1的两个表中的一个。
我的查询当前如下所示：
UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
table_one.invoice_number = table_two.invoice_number AND
table_one.submitted_by = table_two.submitted_by AND
table_one.passport_number = table_two.passport_number AND
table_one.driving_license_number = table_two.driving_license_number AND
table_one.national_id_number = table_two.national_id_number AND
table_one.tax_pin_identification_number = table_two.tax_pin_identification_number AND
table_one.vat_number = table_two.vat_number AND
table_one.ggcg_number = table_two.ggcg_number AND
table_one.national_association_number = table_two.national_association_number

当任一表中的任何列为null
时，表\u one.x
中的某些行的查询失败。i、 只有当所有列都有一些数据时，它才会被更新
这个问题与我之前的一个问题有关，我使用distinct On
从一个大数据集中获取不同的值。我现在想要的是用表中具有唯一字段的值填充大型数据集
更新
我使用了@binotenary提供的第一条update语句。对于小桌子，它在瞬间运行。例如，有一个表包含20000条记录，更新大约在20秒内完成。但另一个有900多万条记录的表到目前为止已经运行了20个小时！。参见下面的解释功能输出
Update on table_one  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
  ->  Nested Loop  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
    Join Filter: ((((my_update_statement_here))))
    ->  Seq Scan on table_one  (cost=0.00..610872.62 rows=9661262 width=1986)
    ->  Seq Scan on table_two  (cost=0.00..6051.98 rows=299998 width=148)

EXPLAIN ANALYZE
选项也花了很长时间，所以我取消了它
关于如何使这种类型的更新更快，有什么想法吗？即使这意味着使用不同的update语句，或者甚至使用自定义函数循环执行更新 您可以使用诸如Oracle的NVL之类的空检查功能。
对于博士后，你必须使用
i、 e.您的查询可以如下所示：
UPDATE table_one SET table_one.x =(select  table_two.y from table_one,table_two
WHERE 
coalesce(table_one.invoice_number,table_two.invoice_number,1) = coalesce(table_two.invoice_number,table_one.invoice_number,1) 
AND
coalesce(table_one.submitted_by,table_two.submitted_by,1) = coalesce(table_two.submitted_by,table_one.submitted_by,1))

where table_one.table_one_pk in  (select  table_one.table_one_pk from table_one,table_two
WHERE 
coalesce(table_one.invoice_number,table_two.invoice_number,1) = coalesce(table_two.invoice_number,table_one.invoice_number,1) 
AND
coalesce(table_one.submitted_by,table_two.submitted_by,1) = coalesce(table_two.submitted_by,table_one.submitted_by,1));

由于null=null
的计算结果为false
，因此除了进行相等性检查外，还需要检查两个字段是否都为null
：
UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
    (table_one.invoice_number = table_two.invoice_number 
        OR (table_one.invoice_number is null AND table_two.invoice_number is null))
    AND
    (table_one.submitted_by = table_two.submitted_by 
        OR (table_one.submitted_by is null AND table_two.submitted_by is null))
    AND 
    -- etc

您还可以使用可读性更强的函数：
UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
    coalesce(table_one.invoice_number, '') = coalesce(table_two.invoice_number, '')
    AND coalesce(table_one.submitted_by, '') = coalesce(table_two.submitted_by, '')
    AND -- etc

但是您需要注意默认值（coalesce
的最后一个参数）。

它的数据类型应该与列类型匹配（例如，这样您就不会将日期与数字进行比较），默认值应该是这样的，即它不会出现在数据中

例如，coalesce（null，1）=coalesce（1，1）
是您希望避免的情况
更新（关于性能）：
Seq扫描table_two
-这表明您在table_two
上没有任何索引

因此，如果您更新表_one
中的一行，那么要在表_two
中找到匹配的行，数据库基本上必须逐个扫描所有行，直到找到匹配的行。

如果对相关列进行索引，则可以更快地找到匹配行
另一方面，如果表\u one
有任何索引，则会减慢更新速度。

根据：
表约束和索引严重延迟了每次写入。如果可能，您应该在更新运行时删除所有索引、触发器和外键，并在最后重新创建它们
同一指南中另一个可能有用的建议是：
如果可以使用（例如）顺序ID对数据进行分段，则可以成批增量更新行
例如，如果table\u one
aid
列，您可以添加如下内容
and table_one.id between x and y

转到where
条件并多次运行查询，更改x
和y
的值，以覆盖所有行
解释分析选项也花了很长时间
在处理带有副作用的语句时，如果将ANALYZE
选项与EXPLAIN一起使用，您可能需要小心。
根据：
请记住，当使用ANALYZE选项时，语句实际上是执行的。尽管EXPLAIN将丢弃SELECT将返回的任何输出，但语句的其他副作用将照常发生
尝试下面的方法，类似于上面的@Bino方法。快告诉我答案
update table_one
set column_x = (select column_y from table_two 
where 
(( table_two.invoice_number = table_one.invoice_number)OR (table_two.invoice_number IS NULL AND table_one.invoice_number IS NULL))
and ((table_two.submitted_by=table_one.submitted_by)OR (table_two.submitted_by IS NULL AND table_one.submitted_by IS NULL)) 
and ((table_two.passport_number=table_one.passport_number)OR (table_two.passport_number IS NULL AND table_one.passport_number IS NULL)) 
and ((table_two.driving_license_number=table_one.driving_license_number)OR (table_two.driving_license_number IS NULL AND table_one.driving_license_number IS NULL)) 
and ((table_two.national_id_number=table_one.national_id_number)OR (table_two.national_id_number IS NULL AND table_one.national_id_number IS NULL)) 
and ((table_two.tax_pin_identification_number=table_one.tax_pin_identification_number)OR (table_two.tax_pin_identification_number IS NULL AND table_one.tax_pin_identification_number IS NULL)) 
and ((table_two.vat_number=table_one.vat_number)OR (table_two.vat_number IS NULL AND table_one.vat_number IS NULL)) 
and ((table_two.ggcg_number=table_one.ggcg_number)OR (table_two.ggcg_number IS NULL AND table_one.ggcg_number IS NULL)) 
and ((table_two.national_association_number=table_one.national_association_number)OR (table_two.national_association_number IS NULL AND table_one.national_association_number IS NULL)) 
);

当前查询使用嵌套循环
连接两个表，这意味着服务器处理
9,661,262 * 299,998 = 2,898,359,277,476

排。难怪要花很长时间
要使联接高效，需要在所有联接列上建立索引。问题是NULL
值
如果对联接列使用函数，通常不能使用索引
如果在连接中使用这样的表达式
：
coalesce(table_one.invoice_number, '') = coalesce(table_two.invoice_number, '')

无法使用索引
因此，我们需要一个索引，我们需要对NULL
值进行一些处理，以使索引可用

我们不需要在表_one
中做任何更改，因为在任何情况下都必须对其进行完整扫描
但是，表2肯定可以改进。更改表本身，或创建单独的（临时）表。它只有300K行，所以应该不会有问题
将联接中使用的所有列设置为非空

CREATE TABLE table_two (
    id int4 NOT NULL,
    invoice_number varchar(30) NOT NULL,
    submitted_by varchar(20) NOT NULL,
    passport_number varchar(30) NOT NULL,
    driving_license_number varchar(30) NOT NULL,
    national_id_number varchar(30) NOT NULL,
    tax_pin_identification_number varchar(30) NOT NULL,
    vat_number varchar(30) NOT NULL,
    ggcg_number varchar(30) NOT NULL,
    national_association_number varchar(30) NOT NULL,
    column_y int,
    CONSTRAINT table_two_pkey PRIMARY KEY (id)
);

更新表格，并将NULL
值替换为'
或其他适当的值
在JOIN
pluscolumn_y
中使用的所有列上创建索引<代码>列y
必须最后包含在索引中。我假设您的更新
格式正确，所以索引应该是唯一的
CREATE UNIQUE INDEX IX ON table_two
(
    invoice_number,
    submitted_by,
    passport_number,
    driving_license_number,
    national_id_number,
    tax_pin_identification_number,
    vat_number,
    ggcg_number,
    national_association_number,
    column_y
);

查询将变为
UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
COALESCE(table_one.invoice_number, '') = table_two.invoice_number AND
COALESCE(table_one.submitted_by, '') = table_two.submitted_by AND
COALESCE(table_one.passport_number, '') = table_two.passport_number AND
COALESCE(table_one.driving_license_number, '') = table_two.driving_license_number AND
COALESCE(table_one.national_id_number, '') = table_two.national_id_number AND
COALESCE(table_one.tax_pin_identification_number, '') = table_two.tax_pin_identification_number AND
COALESCE(table_one.vat_number, '') = table_two.vat_number AND
COALESCE(table_one.ggcg_number, '') = table_two.ggcg_number AND
COALESCE(table_one.national_association_number, '') = table_two.national_association_number

请注意，COALESCE
仅在表1
列上使用
批量执行更新
也是一个好主意，而不是一次更新整个表。例如，选择要批量更新的ID范围
UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
table_one.id >= <some_starting_value> AND
table_one.id < <some_ending_value> AND
COALESCE(table_one.invoice_number, '') = table_two.invoice_number AND
COALESCE(table_one.submitted_by, '') = table_two.submitted_by AND
COALESCE(table_one.passport_number, '') = table_two.passport_number AND
COALESCE(table_one.driving_license_number, '') = table_two.driving_license_number AND
COALESCE(table_one.national_id_number, '') = table_two.national_id_number AND
COALESCE(table_one.tax_pin_identification_number, '') = table_two.tax_pin_identification_number AND
COALESCE(table_one.vat_number, '') = table_two.vat_number AND
COALESCE(table_one.ggcg_number, '') = table_two.ggcg_number AND
COALESCE(table_one.national_association_number, '') = table_two.national_association_number

updatetable\u one SET table\u one.x=table\u two.y
从表2
哪里
表_one.id>=和
表1.id<和
合并（表1.发票编号“”）=表2.发票编号和
合并（表1.submitted\u by.）=表2.submitted\u by AND
合并（表1.passport号）