Sql 从postgres的字符串数组列中删除重复项

Sql 从postgres的字符串数组列中删除重复项,sql,arrays,postgresql,duplicates,sql-update,Sql,Arrays,Postgresql,Duplicates,Sql Update,我有一个PostgreSQL表,其中有一列包含字符串数组。该行具有一些唯一的数组字符串,或者某些行还具有重复的字符串。如果存在重复的字符串,我想从每行中删除它们 我试图回答一些问题,但未能实现 下表如下: veh_id | vehicle_types --------+---------------------------------------- 1 | {"byd_tang","volt","viper","laferr

我有一个PostgreSQL表,其中有一列包含字符串数组。该行具有一些唯一的数组字符串,或者某些行还具有重复的字符串。如果存在重复的字符串,我想从每行中删除它们

我试图回答一些问题,但未能实现

下表如下:

  veh_id |             vehicle_types              
 --------+----------------------------------------
      1  | {"byd_tang","volt","viper","laferrari"} 
      2  | {"volt","viper"}                        
      3  | {"byd_tang","sonata","jaguarxf"}        
      4  | {"swift","teslax","mirai"}              
      5  | {"volt","viper"}                        
      6  | {"viper","ferrariff","bmwi8","viper"}   
      7  | {"ferrariff","viper","viper","volt"}    
我期望得到以下结果:

  veh_id |             vehicle_types              
 --------+----------------------------------------
      1  | {"byd_tang","volt","viper","laferrari"} 
      2  | {"volt","viper"}                        
      3  | {"byd_tang","sonata","jaguarxf"}        
      4  | {"swift","teslax","mirai"}              
      5  | {"volt","viper"}                        
      6  | {"viper","ferrariff","bmwi8"}           
      7  | {"ferrariff","viper","volt"}            

我不认为这是有效的,但类似的方法可能会奏效:

with expanded as (
  select veh_id, unnest (vehicle_types) as vehicle_type
  from vehicles
)
select veh_id, array_agg (distinct vehicle_type)
from expanded
group by veh_id
如果您真的想做一些最糟糕的事情,您可以编写一个自定义函数:

create or replace function unique_array(input_array text[])
returns text[] as $$
DECLARE
  output_array text[];
  i integer;
BEGIN

  output_array = array[]::text[];

  for i in 1..cardinality(input_array) loop
    if not (input_array[i] = any (output_array)) then
      output_array := output_array || input_array[i];
    end if;
  end loop;

  return output_array;
END;
$$
language plpgsql
用法示例:

select veh_id, unique_array(vehicle_types)
from vehicles

由于每行的数组都是独立的,因此使用数组构造函数的普通相关子查询可以完成以下任务:

SELECT *, ARRAY(SELECT DISTINCT unnest (vehicle_types)) AS vehicle_types_uni
FROM   vehicle;
见:

请注意,NULL被转换为空数组“{}”。我们需要做特殊情况下它,但它被排除在下面的更新无论如何

快速简单。但是不要用这个。您没有这样说,但通常希望保留数组元素的原始顺序。你的初步样本也表明了这一点。与相关子查询中的顺序一起使用,这将变得更加复杂:

SELECT *, ARRAY (SELECT v
                 FROM   unnest(vehicle_types) WITH ORDINALITY t(v,ord)
                 GROUP  BY 1
                 ORDER  BY min(ord)
                ) AS vehicle_types_uni
FROM   vehicle;
见:

更新以实际删除重复:

UPDATE vehicle
SET    vehicle_types = ARRAY (
                 SELECT v
                 FROM   unnest(vehicle_types) WITH ORDINALITY t(v,ord)
                 GROUP  BY 1
                 ORDER  BY min(ord)
                )
WHERE  cardinality(vehicle_types) > 1  -- optional
AND    vehicle_types <> ARRAY (
                 SELECT v
                 FROM   unnest(vehicle_types) WITH ORDINALITY t(v,ord)
                 GROUP  BY 1
                 ORDER  BY min(ord)
                ); -- suppress empty updates (optional)
这两种方法都是在可选条件下添加的,以提高性能。第一个是完全多余的。每个条件还排除了NULL情况。第二个将禁止所有空更新

见:

如果您试图在不保留原始顺序的情况下执行此操作,则可能会在不需要的情况下更新大多数行,因为即使没有重复,顺序或元素也会发生更改

要求博士后9.4或更高版本


dbfiddle

为什么建议对函数执行如此糟糕的操作?如果使用在输入数组上使用unnest和distinct的纯语言sql函数,效率会更高