Performance 为什么我会在这个PostgreSQL查询中得到“哈希连接”和FTS？_Performance_Postgresql_Join_Hash_Indexing

Performance 为什么我会在这个PostgreSQL查询中得到“哈希连接”和FTS？

performance postgresql join hash indexing

Performance 为什么我会在这个PostgreSQL查询中得到“哈希连接”和FTS？,performance,postgresql,join,hash,indexing,Performance,Postgresql,Join,Hash,Indexing,我正在尝试优化以下场景：文字格式：我有2个表格、警报和用户设备；在用户设备中，我们跟踪耦合到用户id的设备是否希望获得通知，在警报表中，我们跟踪用户到通知者的关系。基本上，任务是选择每个具有任何警报的用户id，并允许向其注册的任何设备发送通知表“警报”，约900k条记录： Table "public.alerts" Column | Type | Modifiers -------------+-------

我正在尝试优化以下场景：

文字格式：我有2个表格、警报和用户设备；在用户设备中，我们跟踪耦合到用户id的设备是否希望获得通知，在警报表中，我们跟踪用户到通知者的关系。基本上，任务是选择每个具有任何警报的用户id，并允许向其注册的任何设备发送通知

表“警报”，约900k条记录：

               Table "public.alerts"
   Column    |           Type           | Modifiers 
-------------+--------------------------+-----------
 id          | uuid                     | not null
 user_id     | uuid                     | 
 target_id   | uuid                     | 
 target_type | text                     | 
 added_on    | timestamp with time zone | 
 old_id      | text                     | 
Indexes:
    "alerts_pkey" PRIMARY KEY, btree (id)
    "one_alert_per_business_per_user" UNIQUE CONSTRAINT, btree (user_id, target_id)
    "addedon" btree (added_on)
    "targetid" btree (target_id)
    "userid" btree (user_id)
    "userid_targetid" btree (user_id, target_id)
Foreign-key constraints:
    "alerts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

                Table "public.user_devices"
       Column        |           Type           | Modifiers 
---------------------+--------------------------+-----------
 id                  | uuid                     | not null
 user_id             | uuid                     | 
 device_id           | text                     | 
 device_token        | text                     | 
 push_notify_enabled | boolean                  | 
 device_type         | integer                  | 
 device_name         | text                     | 
 badge_count         | integer                  | 
 added_on            | timestamp with time zone | 
Indexes:
    "user_devices_pkey" PRIMARY KEY, btree (id)
    "push_notification" btree (push_notify_enabled)
    "user_id" btree (user_id)
    "user_id_push_notification" btree (user_id, push_notify_enabled)
Foreign-key constraints:
    "user_devices_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

表“用户设备”，约12k条记录：

               Table "public.alerts"
   Column    |           Type           | Modifiers 
-------------+--------------------------+-----------
 id          | uuid                     | not null
 user_id     | uuid                     | 
 target_id   | uuid                     | 
 target_type | text                     | 
 added_on    | timestamp with time zone | 
 old_id      | text                     | 
Indexes:
    "alerts_pkey" PRIMARY KEY, btree (id)
    "one_alert_per_business_per_user" UNIQUE CONSTRAINT, btree (user_id, target_id)
    "addedon" btree (added_on)
    "targetid" btree (target_id)
    "userid" btree (user_id)
    "userid_targetid" btree (user_id, target_id)
Foreign-key constraints:
    "alerts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

                Table "public.user_devices"
       Column        |           Type           | Modifiers 
---------------------+--------------------------+-----------
 id                  | uuid                     | not null
 user_id             | uuid                     | 
 device_id           | text                     | 
 device_token        | text                     | 
 push_notify_enabled | boolean                  | 
 device_type         | integer                  | 
 device_name         | text                     | 
 badge_count         | integer                  | 
 added_on            | timestamp with time zone | 
Indexes:
    "user_devices_pkey" PRIMARY KEY, btree (id)
    "push_notification" btree (push_notify_enabled)
    "user_id" btree (user_id)
    "user_id_push_notification" btree (user_id, push_notify_enabled)
Foreign-key constraints:
    "user_devices_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)

以下查询：

select COUNT(DISTINCT a.user_id) 
from alerts a 
  inner join user_devices ud on a.user_id = ud.user_id 
WHERE ud.push_notify_enabled = true;

大约需要3秒钟，并生成以下计划：

explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id WHERE ud.push_notify_enabled = true;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16)
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16)

我错过了什么，有没有办法加快速度

多谢各位

==编辑==

根据建议，尝试在连接内移动条件，无差异：

=> explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16)
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16)

那么，没有办法摆脱2个FTS？如果我至少能让它以某种方式使用“警报”表上的索引，那就太好了

==编辑==

添加`解释分析'

=> explain ANALYZE select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49777.32..49777.33 rows=1 width=16) (actual time=5254.355..5254.356 rows=1 loops=1)
   ->  Hash Join  (cost=34508.97..48239.63 rows=615074 width=16) (actual time=1824.607..2863.635 rows=614768 loops=1)
         Hash Cond: (ud.user_id = a.user_id)
         ->  Seq Scan on user_devices ud  (cost=0.00..480.75 rows=9202 width=16) (actual time=0.048..16.784 rows=9186 loops=1)
               Filter: push_notify_enabled
         ->  Hash  (cost=20572.32..20572.32 rows=801732 width=16) (actual time=1824.229..1824.229 rows=801765 loops=1)
               Buckets: 4096  Batches: 32  Memory Usage: 990kB
               ->  Seq Scan on alerts a  (cost=0.00..20572.32 rows=801732 width=16) (actual time=0.047..878.429 rows=801765 loops=1)
 Total runtime: 5255.427 ms
(9 rows)

==编辑===

正在添加请求的配置。大部分是Ubuntu PG9.1默认设置：

/etc/postgresql/9.1/main# cat postgresql.conf | grep -e "work_mem" -e "effective_cache" -e "shared_buff" -e "random_page_c"
shared_buffers = 24MB           # min 128kB
#work_mem = 1MB             # min 64kB
#maintenance_work_mem = 16MB        # min 1MB
#wal_buffers = -1           # min 32kB, -1 sets based on shared_buffers
#random_page_cost = 4.0         # same scale as above
#effective_cache_size = 128MB

正如评论中所说，真正的麻烦是对警报表的完整扫描。从逻辑上讲，对于给定的用户ID，警报中的任何和所有记录都可能与该用户ID匹配

您有一个可能限制扫描的条件：push\u notify\u enabled；在错误的地方不需要行。但是此列上缺少索引，因此警报的完全扫描仍然是连接这两个表的最快方式

如果您的Postgres版本支持，请尝试在启用推送通知时使用位图索引。显然，2值列上的btree索引是不好的

要加快查询速度，您必须限制警报中要扫描的行数，即，在警报的某个索引列上添加条件。如果索引具有足够的选择性，则可以进行索引扫描而不是完全扫描

例如，如果这有意义，您可以按目标ID或某个日期相关列进行筛选

如果你有900k个警报，这些警报都是活动的，并且可以在用户之间任意共享，那么你就别无选择；添加RAM以保持警报表始终处于缓存状态可能会有所帮助。添加硬件通常是最简单、最经济的解决方案

AFAICT您只对与推送通知关联的警报感兴趣。如果具有推送通知的用户从不与没有推送通知的用户共享警报，则可以根据此条件有效地拆分警报

如果您有位图索引，则可以将“已启用推送通知”列移至“警报”。否则，您可以尝试在该列上使用物理拆分。如果具有推送通知的警报数量明显低于警报总数，则将扫描少得多的警报部分以进行加入。

将索引替换为部分索引：

DROP INDEX    user_id_push_notification ;
CREATE INDEX    user_id_push_notification ON user_devices (user_id)
 WHERE push_notify_enabled =True
 ;

，并将随机页面成本设置为较低的值：

设置随机页面成本=1.1

在用户设备ud上使用push_通知对小于300ms的me进行索引扫描。YMMV

seqscan on alerts似乎或多或少不可避免，因为您预期800K/900K:=88%行。IMHO说，索引扫描只有在行大小非常大的情况下才有效

更新：将用户表添加到查询中似乎会强制进行三重索引扫描。但差不多同时

explain  ANALYZE
select COUNT(DISTINCT a.user_id)
from alerts a
join user_devices ud on a.user_id = ud.user_id
join users us on a.user_id = us.id
WHERE ud.push_notify_enabled = true;

PostgreSQL几乎需要访问警报表中的每一行。因此，序列扫描将是最快的事情。如果您使用的是9.2，它实际上可能会对userid索引执行仅索引扫描。如果您将where条件移动到连接条件（如a.user\u id=ud.user\u id和ud.push\u notify启用的内部连接用户设备ud），是否有任何区别？不需要=真实零件。顺便说一句，我们将尝试移动，看看是否有帮助。Afk现在，将在几个小时内检查。谢谢您的评论。@Clodoaldo:尝试过，没有帮助，请参见编辑：请添加解释分析的输出，该输出将显示预期和观察到的行数。可能您的统计信息已关闭，或者缺少。不幸的是，这里似乎没有可用的位图索引。另一个问题可能是total bs，但只是大声想一想-如果我将push_notify_enabled传输到alerts表会怎么样？我的坏-出于某种原因，我将push_notify_enabled误认为是alerts中的一列。我将编辑我的回复。尝试升级到9.2以获得机会。嘎，我需要更多地了解博士后，这太令人沮丧了：@favoretti:当你得不到你想要的东西时，经验就是你得到的：而且：好的判断来自经验；经验源于错误的判断。9.1还是9.2？对我来说没有什么不同，但我正在升级到9.2.2。我也需要升级-我做错什么了吗？您提出的查询强制我进行三重序列扫描：奇怪。不过，我确实将uuid更改为整数和串行。对于generate_系列的easyer insert，它似乎不支持UUID，我很乐意向这个数据库使用UUID的家伙开枪，但他大约有6000英里再远一点