Python 减少在PostgreSQL中使用多个内部联接插入记录的执行时间

Python 减少在PostgreSQL中使用多个内部联接插入记录的执行时间,python,sql,postgresql,psycopg2,Python,Sql,Postgresql,Psycopg2,我有一个用例,我必须在DATAVANT\u COVID\u MATCH表中插入记录DAILY,同时连接其他3个表 SELECT version() = ('PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit',) 所以我的问题是,如何减少执行时间? 欢迎提出任何意见/建议。多谢各位 下面是当前每天运行的代码 INDEX: INDEX cre

我有一个用例,我必须在
DATAVANT\u COVID\u MATCH
表中插入记录DAILY,同时连接其他3个表

SELECT version() = ('PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit',)
所以我的问题是,如何减少执行时间? 欢迎提出任何意见/建议。多谢各位

下面是当前每天运行的代码

INDEX:
INDEX created for every partition of MORTALITY_INDEX table

example:
CREATE INDEX mortality_index_1941_dod_idx
    ON datavant_stg_o.mortality_index_1940 USING btree
    (dod ASC NULLS LAST)
    TABLESPACE pg_default
CREATE INDEX mortality_index_1941_1945_dod_idx
    ON datavant_stg_o.mortality_index_1941_1945 USING btree
    (dod ASC NULLS LAST)
    TABLESPACE pg_default;
etc...
编辑:按要求解释(分析、成本、详细信息、缓冲区) 对于选择


                    INSERT INTO DATAVANT_O.DATAVANT_COVID_MATCH_{}
                    SELECT
                    CUST_LAST_NM,
                    CUST_FRST_NM,
                    CIGNA_DOB,
                    CIGNA_ZIP,
                    DATAVANT_DOD,
                    DATAVANT_DOB,
                    DEATH_VERIFICATION,
                    DATA_SOURCE,
                    INDIV_ENTPR_ID
                    FROM
                    (
                        SELECT
                        CR.PATNT_LAST_NM AS CUST_LAST_NM,
                        CR.PATNT_FRST_NM AS CUST_FRST_NM,
                        CRD.CUST_BRTH_DT AS CIGNA_DOB,
                        CR.PATNT_POSTL_CD AS CIGNA_ZIP,
                        MI.DOD AS DATAVANT_DOD,
                        MI.DOB AS DATAVANT_DOB,
                        MI.DEATH_VERIFICATION,
                        MI.DATA_SOURCE,
                        CRD.INDIV_ENTPR_ID,
                        ROW_NUMBER () OVER (PARTITION BY CRD.INDIV_ENTPR_ID ORDER BY CRD.INDIV_ENTPR_ID DESC)
                        FROM DATAVANT_O.COVID_PATNT_REGISTRY_DEID CRD
                        INNER JOIN DATAVANT_STG_O.MORTALITY_INDEX_{} MI ON
                        CRD.TOKEN_1 = MI.TOKEN_1 AND
                        CRD.TOKEN_2 = MI.TOKEN_2 AND
                        CRD.TOKEN_4

如果这是一天一件事,为什么你在乎它需要多长时间?请考虑阅读:(特别是关于极小的部分)我可以看到一个微优化:你不需要每次重新连接到数据库。在循环外部而不是内部进行。嘿@TimRoberts问得好,如果每天的执行时间超过一天怎么办?现在看到我的问题了吗?@TimRoberts每次我重新连接的原因是如果其中一个插入中断,我会丢失所有间隔的全部匹配数据,而不仅仅是中断的插入。

                    INSERT INTO DATAVANT_O.DATAVANT_COVID_MATCH_{}
                    SELECT
                    CUST_LAST_NM,
                    CUST_FRST_NM,
                    CIGNA_DOB,
                    CIGNA_ZIP,
                    DATAVANT_DOD,
                    DATAVANT_DOB,
                    DEATH_VERIFICATION,
                    DATA_SOURCE,
                    INDIV_ENTPR_ID
                    FROM
                    (
                        SELECT
                        CR.PATNT_LAST_NM AS CUST_LAST_NM,
                        CR.PATNT_FRST_NM AS CUST_FRST_NM,
                        CRD.CUST_BRTH_DT AS CIGNA_DOB,
                        CR.PATNT_POSTL_CD AS CIGNA_ZIP,
                        MI.DOD AS DATAVANT_DOD,
                        MI.DOB AS DATAVANT_DOB,
                        MI.DEATH_VERIFICATION,
                        MI.DATA_SOURCE,
                        CRD.INDIV_ENTPR_ID,
                        ROW_NUMBER () OVER (PARTITION BY CRD.INDIV_ENTPR_ID ORDER BY CRD.INDIV_ENTPR_ID DESC)
                        FROM DATAVANT_O.COVID_PATNT_REGISTRY_DEID CRD
                        INNER JOIN DATAVANT_STG_O.MORTALITY_INDEX_{} MI ON
                        CRD.TOKEN_1 = MI.TOKEN_1 AND
                        CRD.TOKEN_2 = MI.TOKEN_2 AND
                        CRD.TOKEN_4
"Subquery Scan on x  (cost=28188731.46..28188731.66 rows=1 width=49) (actual time=38.348..38.348 rows=0 loops=1)"
"  Output: x.cust_last_nm, x.cust_frst_nm, x.cigna_dob, x.cigna_zip, x.datavant_dod, x.datavant_dob, x.death_verification, x.data_source, x.indiv_entpr_id"
"  Filter: (x.row_number = 1)"
"  Buffers: shared hit=141 read=3"
"  ->  WindowAgg  (cost=28188731.46..28188731.58 rows=6 width=57) (actual time=38.346..38.346 rows=0 loops=1)"
"        Output: cr.patnt_last_nm, cr.patnt_frst_nm, crd.cust_brth_dt, cr.patnt_postl_cd, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source, crd.indiv_entpr_id, row_number() OVER (?)"
"        Buffers: shared hit=141 read=3"
"        ->  Sort  (cost=28188731.46..28188731.48 rows=6 width=49) (actual time=38.338..38.338 rows=0 loops=1)"
"              Output: crd.indiv_entpr_id, cr.patnt_last_nm, cr.patnt_frst_nm, crd.cust_brth_dt, cr.patnt_postl_cd, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
"              Sort Key: crd.indiv_entpr_id DESC"
"              Sort Method: quicksort  Memory: 25kB"
"              Buffers: shared hit=141 read=3"
"              ->  Nested Loop  (cost=1018.80..28188731.39 rows=6 width=49) (actual time=38.291..38.291 rows=0 loops=1)"
"                    Output: crd.indiv_entpr_id, cr.patnt_last_nm, cr.patnt_frst_nm, crd.cust_brth_dt, cr.patnt_postl_cd, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
"                    Join Filter: ((crd.indiv_entpr_id)::text = (cr.indiv_entpr_id)::text)"
"                    Buffers: shared hit=138 read=3"
"                    ->  Gather  (cost=1018.80..27906096.67 rows=1 width=29) (actual time=38.290..39.672 rows=0 loops=1)"
"                          Output: crd.cust_brth_dt, crd.indiv_entpr_id, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
"                          Workers Planned: 2"
"                          Workers Launched: 2"
"                          Buffers: shared hit=138 read=3"
"                          ->  Hash Join  (cost=18.80..27905096.57 rows=1 width=29) (actual time=9.141..9.143 rows=0 loops=3)"
"                                Output: crd.cust_brth_dt, crd.indiv_entpr_id, mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source"
"                                Hash Cond: (((mi_13.token_1)::text = (crd.token_1)::text) AND ((mi_13.token_2)::text = (crd.token_2)::text) AND ((mi_13.token_4)::text = (crd.token_4)::text))"
"                                Buffers: shared hit=138 read=3"
"                                Worker 0: actual time=9.014..9.017 rows=0 loops=1"
"                                  Buffers: shared hit=69 read=1"
"                                Worker 1: actual time=11.521..11.523 rows=0 loops=1"
"                                  Buffers: shared hit=69 read=1"
"                                ->  Parallel Append  (cost=0.00..11089723.14 rows=91512134 width=148) (actual time=8.920..8.920 rows=1 loops=3)"
"                                      Buffers: shared read=3"
"                                      Worker 0: actual time=8.689..8.689 rows=1 loops=1"
"                                        Buffers: shared read=1"
"                                      Worker 1: actual time=11.242..11.242 rows=1 loops=1"
"                                        Buffers: shared read=1"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_2001_2005 mi_13  (cost=0.00..1071803.02 rows=9009302 width=148) (actual time=11.240..11.240 rows=1 loops=1)"
"                                            Output: mi_13.dod, mi_13.dob, mi_13.death_verification, mi_13.data_source, mi_13.token_1, mi_13.token_2, mi_13.token_4"
"                                            Buffers: shared read=1"
"                                            Worker 1: actual time=11.240..11.240 rows=1 loops=1"
"                                              Buffers: shared read=1"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1996_2000 mi_12  (cost=0.00..1025583.10 rows=8684310 width=146) (actual time=8.687..8.687 rows=1 loops=1)"
"                                            Output: mi_12.dod, mi_12.dob, mi_12.death_verification, mi_12.data_source, mi_12.token_1, mi_12.token_2, mi_12.token_4"
"                                            Buffers: shared read=1"
"                                            Worker 0: actual time=8.687..8.687 rows=1 loops=1"
"                                              Buffers: shared read=1"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1991_1995 mi_11  (cost=0.00..946087.22 rows=8131622 width=145) (never executed)"
"                                            Output: mi_11.dod, mi_11.dob, mi_11.death_verification, mi_11.data_source, mi_11.token_1, mi_11.token_2, mi_11.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_2016_2020 mi_16  (cost=0.00..941021.10 rows=8897010 width=150) (never executed)"
"                                            Output: mi_16.dod, mi_16.dob, mi_16.death_verification, mi_16.data_source, mi_16.token_1, mi_16.token_2, mi_16.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_2011_2015 mi_15  (cost=0.00..912659.89 rows=8274389 width=149) (never executed)"
"                                            Output: mi_15.dod, mi_15.dob, mi_15.death_verification, mi_15.data_source, mi_15.token_1, mi_15.token_2, mi_15.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1986_1990 mi_10  (cost=0.00..868442.25 rows=7467025 width=145) (never executed)"
"                                            Output: mi_10.dod, mi_10.dob, mi_10.death_verification, mi_10.data_source, mi_10.token_1, mi_10.token_2, mi_10.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1981_1985 mi_9  (cost=0.00..855944.89 rows=7200489 width=149) (never executed)"
"                                            Output: mi_9.dod, mi_9.dob, mi_9.death_verification, mi_9.data_source, mi_9.token_1, mi_9.token_2, mi_9.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1976_1980 mi_8  (cost=0.00..834044.82 rows=7014082 width=149) (never executed)"
"                                            Output: mi_8.dod, mi_8.dob, mi_8.death_verification, mi_8.data_source, mi_8.token_1, mi_8.token_2, mi_8.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_2006_2010 mi_14  (cost=0.00..826553.35 rows=7065335 width=149) (never executed)"
"                                            Output: mi_14.dod, mi_14.dob, mi_14.death_verification, mi_14.data_source, mi_14.token_1, mi_14.token_2, mi_14.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1971_1975 mi_7  (cost=0.00..772189.73 rows=6492773 width=148) (never executed)"
"                                            Output: mi_7.dod, mi_7.dob, mi_7.death_verification, mi_7.data_source, mi_7.token_1, mi_7.token_2, mi_7.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1966_1970 mi_6  (cost=0.00..645641.55 rows=5434455 width=148) (never executed)"
"                                            Output: mi_6.dod, mi_6.dob, mi_6.death_verification, mi_6.data_source, mi_6.token_1, mi_6.token_2, mi_6.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1961_1965 mi_5  (cost=0.00..372921.92 rows=3139692 width=148) (never executed)"
"                                            Output: mi_5.dod, mi_5.dob, mi_5.death_verification, mi_5.data_source, mi_5.token_1, mi_5.token_2, mi_5.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1956_1960 mi_4  (cost=0.00..204172.40 rows=1717040 width=149) (never executed)"
"                                            Output: mi_4.dod, mi_4.dob, mi_4.death_verification, mi_4.data_source, mi_4.token_1, mi_4.token_2, mi_4.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1951_1955 mi_3  (cost=0.00..152593.93 rows=1282493 width=149) (never executed)"
"                                            Output: mi_3.dod, mi_3.dob, mi_3.death_verification, mi_3.data_source, mi_3.token_1, mi_3.token_2, mi_3.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1946_1950 mi_2  (cost=0.00..99094.05 rows=832705 width=149) (never executed)"
"                                            Output: mi_2.dod, mi_2.dob, mi_2.death_verification, mi_2.data_source, mi_2.token_1, mi_2.token_2, mi_2.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1941_1945 mi_1  (cost=0.00..80018.56 rows=672956 width=149) (never executed)"
"                                            Output: mi_1.dod, mi_1.dob, mi_1.death_verification, mi_1.data_source, mi_1.token_1, mi_1.token_2, mi_1.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_1940 mi  (cost=0.00..23379.22 rows=196422 width=149) (never executed)"
"                                            Output: mi.dod, mi.dob, mi.death_verification, mi.data_source, mi.token_1, mi.token_2, mi.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_2021_2025 mi_17  (cost=0.00..10.47 rows=47 width=398) (never executed)"
"                                            Output: mi_17.dod, mi_17.dob, mi_17.death_verification, mi_17.data_source, mi_17.token_1, mi_17.token_2, mi_17.token_4"
"                                      ->  Parallel Seq Scan on datavant_stg_o.mortality_index_2026_2030 mi_18  (cost=0.00..1.01 rows=1 width=398) (actual time=6.825..6.825 rows=1 loops=1)"
"                                            Output: mi_18.dod, mi_18.dob, mi_18.death_verification, mi_18.data_source, mi_18.token_1, mi_18.token_2, mi_18.token_4"
"                                            Buffers: shared read=1"
"                                ->  Hash  (cost=13.20..13.20 rows=320 width=145) (actual time=0.010..0.011 rows=0 loops=3)"
"                                      Output: crd.cust_brth_dt, crd.indiv_entpr_id, crd.token_1, crd.token_2, crd.token_4"
"                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB"
"                                      Worker 0: actual time=0.015..0.015 rows=0 loops=1"
"                                      Worker 1: actual time=0.013..0.013 rows=0 loops=1"
"                                      ->  Seq Scan on datavant_o.covid_patnt_registry_deid crd  (cost=0.00..13.20 rows=320 width=145) (actual time=0.010..0.010 rows=0 loops=3)"
"                                            Output: crd.cust_brth_dt, crd.indiv_entpr_id, crd.token_1, crd.token_2, crd.token_4"
"                                            Worker 0: actual time=0.015..0.015 rows=0 loops=1"
"                                            Worker 1: actual time=0.012..0.012 rows=0 loops=1"
"                    ->  Seq Scan on datavant_o.covid_patnt_registry cr  (cost=0.00..213480.43 rows=5532343 width=29) (never executed)"
"                          Output: cr.covid_patnt_regstry_sv_key, cr.cret_ts, cr.indiv_entpr_id, cr.patnt_frst_nm, cr.patnt_last_nm, cr.patnt_brth_dt, cr.patnt_gendr_cd, cr.patnt_st_cd, cr.patnt_postl_cd, cr.patnt_dth_dt, cr.covid_idfd_frm_clm_ind, cr.covid_idfd_frm_lab_ind, cr.frst_diag_dt, cr.hosp_ind, cr.frst_covid_admsn_clm_event_key, cr.frst_covid_admsn_refined_clm_event_key, cr.frst_covid_icu_admsn_refined_clm_event_key, cr.frst_fllwup_clm_ln_key, cr.frst_fllwup_clm_svc_beg_dt, cr.subscrbr_indiv_entpr_id, cr.pre_covid_clncl_case_key, cr.post_covid_clncl_case_key, cr.prim_covid_diag_cd, cr.prim_covid_diag_dt, cr.sec_covid_diag_cd, cr.sec_covid_diag_dt, cr.frst_vacnn_dt, cr.sec_vacnn_dt, cr.vacn_manfctrer_nm, cr.load_ctl_key, cr.ingest_timestamp, cr.incr_ingest_timestamp"
"Planning Time: 4.229 ms"
"Execution Time: 40.007 ms"