在函数扫描的情况下，“解释分析”（postgresql）中的“行”参数是如何估计的？_Sql_Postgresql_Sql Execution Plan_Explain

在函数扫描的情况下，“解释分析”（postgresql）中的“行”参数是如何估计的？

sql postgresql

在函数扫描的情况下，“解释分析”（postgresql）中的“行”参数是如何估计的？,sql,postgresql,sql-execution-plan,explain,Sql,Postgresql,Sql Execution Plan,Explain,我检查了一个复杂查询的部分执行计划，并得出以下结论： postgres=# explain analyze select * from generate_series( (CURRENT_

我检查了一个复杂查询的部分执行计划，并得出以下结论：

postgres=# explain analyze                                                                                                                                                
select * from generate_series(
            (CURRENT_DATE)::timestamp without time zone,
            (CURRENT_DATE + '14 days'::interval),
            '1 day'::interval)
;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.01..10.01 rows=1000 width=8) (actual time=0.024..0.036 rows=15 loops=1)
 Planning Time: 0.031 ms
 Execution Time: 0.064 ms
(3 rows)

好吧，postgresql根据给定表的重元组大小估计行，这是可以理解的

假设上述generate_系列实际上生成了14行，那么在函数扫描的情况下，行=1000从何而来？

根据：

对于那些对更多细节感兴趣的人，请估算在中完成任何WHERE子句之前的表 src/backend/optimizer/util/plancat.c。从句的一般逻辑选择性在src/backend/optimizer/path/clauseel.c中。这个特定于操作员的选择性函数主要存在于 src/backend/utils/adt/selfuncs.c

这是计算函数估计值的函数：

/*
  * function_selectivity
  *
  * Returns the selectivity of a specified boolean function clause.
  * This code executes registered procedures stored in the
  * pg_proc relation, by calling the function manager.
  *
  * See clause_selectivity() for the meaning of the additional parameters.
  */
 Selectivity
 function_selectivity(PlannerInfo *root,
                      Oid funcid,
                      List *args,
                      Oid inputcollid,
                      bool is_join,
                      int varRelid,
                      JoinType jointype,
                      SpecialJoinInfo *sjinfo)
 {

看起来此C函数将读取pg_proc系统目录中的数据，其中我们有：

postgres=# select proname, prosupport, prorows 
           from pg_proc 
           where proname like '%generate%';
           proname            |          prosupport          | prorows 
------------------------------+------------------------------+---------
 generate_subscripts          | -                            |    1000
 generate_subscripts          | -                            |    1000
 generate_series              | generate_series_int4_support |    1000
 generate_series              | generate_series_int4_support |    1000
 generate_series_int4_support | -                            |       0
 generate_series              | generate_series_int8_support |    1000
 generate_series              | generate_series_int8_support |    1000
 generate_series_int8_support | -                            |       0
 generate_series              | -                            |    1000
 generate_series              | -                            |    1000
 generate_series              | -                            |    1000
 generate_series              | -                            |    1000
(12 rows)

看起来pg_proc.prorows列是检索到的估计值。

采用数字参数的generate_series函数有一个支持函数，该函数将查看参数，然后告诉计划者需要多少行。但是那些处理时间戳的没有这样的支持功能。相反，它只是估计返回prorows，默认值为1000

如果需要，您可以更改此估计：

alter function generate_series(timestamp without time zone, timestamp without time zone, interval) 
    rows 14;

但这种变化不会在pg_升级或转储/重新加载后继续存在

这是特定于版本的，因为支持功能仅在v12中实现。在此之前，即使是采用表单的数字也总是计划在1000行上，或者该函数的prorows设置为任何值。

作为一种明显的解决方法，您可以通过在子查询中使用限制来欺骗计划者：

重点是什么。这是一个非常明显的过早优化的例子。在这种情况下，结果会在大脑意识到你按下run之前返回

真正的问题是程序员花了太多的时间在错误的地点和错误的时间担心效率；过早优化是万恶之源，或者至少是大多数问题的根源它在编程中起着重要作用

Donald Knuth，《计算机编程艺术》，1962年

现在的问题似乎至少和当时一样大。

您应该补充，1000的值来自create function命令的rows参数的默认值：是的，这是可能的，但在boostrap过程中使用的pg_proc.dat中也有一些初始数据，例如{oid=>'1066'，descr=>'non-persistent series generator'，proname=>'generate_series'，prorows=>'1000'，prosupport=>'generate_series_int4_support'，proretset=>'t'，prorettype=>'int4'，prosrc=>'generate_series_step_int4'}，还有一些支持函数，我认为它们是为此目的而创建的。它们似乎是在Postgres-12中引入的。我没有使用它们的经验注意：它会生成15行fencepost错误

select * FROM (
        select * from generate_series(
            (CURRENT_DATE)::timestamp without time zone,
            (CURRENT_DATE + '14 days'::interval),
            '1 day'::interval)
        LIMIT 15;
        ) xxx
    ;