《專場13.1-PostgreSQL之SQL優化小技巧-傅強.pdf》由會員分享,可在線閱讀,更多相關《專場13.1-PostgreSQL之SQL優化小技巧-傅強.pdf(30頁珍藏版)》請在三個皮匠報告上搜索。
1、PostgreSQL之SQL優化小技巧 傅強 美創科技DBA表的掃描方式01表的連接方式02SQL改寫03數據庫配置04架構設計05CONTENTS目 錄相關工具06表的掃描方式掃描方式簡稱掃描方式簡稱掃描方式說明掃描方式說明Seq Scan順序掃描整個對象Parallel Seq Scan采用并行方式順序掃描整個對象Index Scan采用離散讀的方式,利用索引訪問某個對象Index Only Scan僅通過索引,不訪問表快速訪問某個對象Bitmap Index Scan通過多個索引掃描后形成位圖找到符合條件的數據Bitmap Heap Scan往往跟隨bitmap index scan,使
2、用該掃描生成的位圖訪問對象在等值或范圍查詢、排序及分組查詢時使用索引字段,以盡量避免表掃描postgres=#explain analyze select*from tbl_index where a=557858;QUERY PLAN -Seq Scan on tbl_index (cost=0.00.18870.00 rows=2 width=21)(actual time=0.036.671.703 rows=1 loops=1)Filter:(a=557858)Rows Removed by Filter:999999 Planning Time:0.344 ms Execution
3、Time:671.862 ms(5 rows)postgres=#explain analyze select*from tbl_index where a=557858;QUERY PLAN -Index Scan using tbl_index_a on tbl_index (cost=0.42.12.46 rows=2 width=21)(actual time=0.145.0.147 rows=1 loops=1)Index Cond:(a=557858)Planning Time:0.367 ms Execution Time:0.235 ms(4 rows)減少使用導致索引掃描失效
4、的SQL語句書寫方式(函數,表達式等)postgres=#explain analyze select*from tbl_index where a:varchar=557858;QUERY PLAN -Seq Scan on tbl_index (cost=0.00.23870.00 rows=5000 width=21)(actual time=0.056.769.264 rows=1 loops=1)Filter:(a):character varying):text=557858:text)Rows Removed by Filter:999999 Planning Time:0.65
5、8 ms Execution Time:769.335 ms(5 rows)表的掃描方式索引類型不匹配表的掃描方式collate不一致數據類型不一致表中數據量少時索引字段在表中占比較高postgres=#explain analyze select*from test where id 10000;QUERY PLAN -Seq Scan on test (cost=0.00.169248.60 rows=10539 width=8)(actual time=0.053.2305.860 rows=9999 loops=1)Filter:(id 10000)Rows Removed by Fi
6、lter:9990001 Planning Time:1.076 ms Execution Time:2307.000 ms(5 rows)表的掃描方式postgres=#explain analyze select*from test where id Parallel Seq Scan on test (cost=0.00.75498.15 rows=2635 width=8)(actual time=627.179.787.198 rows=2000 loops=5)Filter:(id 10000)Rows Removed by Filter:1998000 Planning Time
7、:0.193 ms Execution Time:845.272 ms(8 rows)set max_parallel_workers_per_gather=8;表的掃描方式索引類型(按結構劃分)索引類型(按結構劃分)B-treeB-tree索引:索引:適合所有的數據類型,支持排序,支持=BETWEEN、IN、IS NULL、is not null等條件。HashHash索引:索引:只能處理等值查詢,不寫入xlog,數據庫崩潰后丟失,不建議使用。GiSTGiST索引:索引:常用于幾何、范圍、空間等類型(包含、相交、距離排序等)。GINGIN索引:索引:常用于多值類型,例如數組、全文檢索、分詞、
8、模糊查詢。BRINBRIN索引:索引:針對數據塊級別的索引,索引占用空間非常小,使用與btree索引類似,適用于序列、ctime等于數據物理存儲相關性很強的字段的范圍查詢。SP-GiSTSP-GiST索引:索引:適合使用KD樹,四叉樹,基數樹算法的場景。索引類型(按用途劃分)索引類型(按用途劃分)部分索引:部分索引:只需要查詢表中部分數據的需求,建部分索引能夠縮小索引體積,性能更優。查詢必須添加部分索引中的篩選條件。函數索引:函數索引:基于某個函數或者表達式的值創建的索引。多列索引:多列索引:針對多個字段創建索引,原則:選擇度高的字段放到前面性能會更好。唯一索引:唯一索引:字段不允許相同值,主
9、鍵會自動創建一個唯一索引,空值不等于空值。覆蓋索引:覆蓋索引:只需要訪問索引的數據就能獲得需要的結果,而不需要再次訪問表中的數據。表的掃描方式計算postgres=#explain analyze select*from tbl_index where a+1=100;QUERY PLAN -Seq Scan on tbl_index (cost=0.00.21370.00 rows=5000 width=21)(actual time=293.704.293.725 rows=0 loops=1)Filter:(a+1)=100)Rows Removed by Filter:1000000
10、Planning Time:0.282 ms Execution Time:293.856 ms(5 rows)postgres=#create index tbl_index_a_1 on tbl_index using btree(a+1);postgres=#explain analyze select*from tbl_index where a+1=100;QUERY PLAN -Bitmap Heap Scan on tbl_index (cost=91.17.6441.55 rows=5000 width=21)(actual time=0.130.0.131 rows=0 lo
11、ops=1)Recheck Cond:(a+1)=100)-Bitmap Index Scan on tbl_index_a_1 (cost=0.00.89.92 rows=5000 width=0)(actual time=0.119.0.119 rows=0 loops=1)Index Cond:(a+1)=100)Planning Time:0.586 ms Execution Time:0.219 ms(6 rows)添加索引表的掃描方式類型轉換postgres=#create index tbl_index_a on tbl_index using btree(a:varchar);
12、postgres=#explain analyze select*from tbl_index where a:varchar=557858;QUERY PLAN -Bitmap Heap Scan on tbl_index (cost=91.17.6454.05 rows=5000 width=21)(actual time=0.107.0.109 rows=1 loops=1)Recheck Cond:(a):character varying):text=557858:text)Heap Blocks:exact=1 -Bitmap Index Scan on tbl_index_a (
13、cost=0.00.89.92 rows=5000 width=0)(actual time=0.086.0.087 rows=1 loops=1)Index Cond:(a):character varying):text=557858:text)Planning Time:0.587 ms Execution Time:0.153 ms(7 rows)添加索引表的掃描方式表的連接方式類別類別Nested LoopHash JoinMerge Join使用條件小表作為驅動表被驅動表有索引小表用于構造hash桶Hash Join不依賴于索引小表作為驅動表適用于很大的表Join優點當有高選擇性索
14、引時,效率比較高。當缺乏索引或者索引條件模糊時,Hash Join比Nested Loop有效。通常比Merge Join快。當缺乏索引或者索引條件模糊時,Merge Join比Nested Loop有效。缺點返回的結果集大,效率低。需要大量內存。所有表都需要排序。postgres=#explain select t1.*from test1 t1,test2 t2 where t1.id=t2.id;QUERY PLAN -Hash Join (cost=28.50.19278.50 rows=1000 width=15)Hash Cond:(t2.id=t1.id)-Seq Scan on
15、 test2 t2 (cost=0.00.15490.00 rows=1000000 width=4)-Hash (cost=16.00.16.00 rows=1000 width=15)-Seq Scan on test1 t1 (cost=0.00.16.00 rows=1000 width=15)(5 rows)表的連接方式postgres=#create index idx_test1 on test1(id);postgres=#create index idx_test2 on test2(id);postgres=#explain select t1.*from test1 t1
16、,test2 t2 where t1.id=t2.id;QUERY PLAN -Merge Join (cost=1.21.86.05 rows=1000 width=15)Merge Cond:(t1.id=t2.id)-Index Scan using idx_test1 on test1 t1 (cost=0.28.44.27 rows=1000 width=15)-Index Only Scan using idx_test2 on test2 t2 (cost=0.42.25980.42 rows=1000000 width=4)(4 rows)添加索引UPDATE中包含子查詢pos
17、tgres=#explain(analyze,verbose,timing,buffers)update t1 set info=(select info from t2 where t1.id=t2.id)where t1.id Seq Scan on public.t1 (cost=0.00.175135.00 rows=9998 width=38)(actual time=0.412.2928.959 rows=9998 loops=1)Output:(SubPlan 1),t1.ctid Filter:(t1.id Seq Scan on public.t2 (cost=0.00.17
18、.50 rows=1 width=32)(actual time=0.264.0.277 rows=0 loops=9998)Output:t2.info Filter:(t1.id=t2.id)Rows Removed by Filter:1000 Buffers:shared hit=49990 Planning:Buffers:shared hit=19 Planning Time:0.761 ms Execution Time:3160.871 ms(17 rows)SQL改寫SQL改寫postgres=#explain(analyze,verbose,timing,buffers)u
19、pdate t1 set info=t2.info from t2 where t1.id=t2.id and t1.id Hash Join (cost=27.50.288.99 rows=1000 width=44)(actual time=0.480.4.998 rows=1000 loops=1)Output:t2.info,t1.ctid,t2.ctid Hash Cond:(t1.id=t2.id)Buffers:shared hit=94 -Seq Scan on public.t1 (cost=0.00.214.00 rows=9998 width=10)(actual tim
20、e=0.093.2.726 rows=9998 loops=1)Output:t1.ctid,t1.id Filter:(t1.id Hash (cost=15.00.15.00 rows=1000 width=42)(actual time=0.370.0.371 rows=1000 loops=1)Output:t2.info,t2.ctid,t2.id Buckets:1024 Batches:1 Memory Usage:51kB Buffers:shared hit=5 -Seq Scan on public.t2 (cost=0.00.15.00 rows=1000 width=4
21、2)(actual time=0.015.0.175 rows=1000 loops=1)Output:t2.info,t2.ctid,t2.id Buffers:shared hit=5 Planning:Buffers:shared hit=44 Planning Time:0.833 ms Execution Time:9.926 ms(22 rows)將SQL修改為update.set.from.where.標量子查詢postgres=#explain analyze select t1.id,postgres-#(select t2.num from t2 where t2.id=t
22、1.id)as numpostgres-#from t1;QUERY PLAN -Seq Scan on t1 (cost=0.00.18515489.00 rows=1000000 width=36)(actual time=0.685.127992.907 rows=1000000 loops=1)SubPlan 1 -Seq Scan on t2 (cost=0.00.18.50 rows=1 width=11)(actual time=0.122.0.122 rows=0 loops=1000000)Filter:(id=t1.id)Rows Removed by Filter:100
23、0 Planning Time:2.558 ms Execution Time:128286.385 ms(7 rows)SQL改寫postgres=#explain analyze select t1.id,t2.numpostgres-#from t1postgres-#left join t2 on(t2.id=t1.id);QUERY PLAN -Hash Left Join (cost=28.50.19277.50 rows=1000000 width=15)(actual time=1.948.659.592 rows=1000000 loops=1)Hash Cond:(t1.i
24、d=t2.id)-Seq Scan on t1 (cost=0.00.15489.00 rows=1000000 width=4)(actual time=0.026.221.160 rows=1000000 loops=1)-Hash (cost=16.00.16.00 rows=1000 width=15)(actual time=1.513.1.514 rows=1000 loops=1)Buckets:1024 Batches:1 Memory Usage:56kB -Seq Scan on t2 (cost=0.00.16.00 rows=1000 width=15)(actual
25、time=0.044.0.423 rows=1000 loops=1)Planning Time:2.462 ms Execution Time:753.455 ms(8 rows)改寫成外連接視圖合并postgres=#explain analyze select v.id,v1.numpostgres-#from(select t1.id from t1,t2 where t1.id=t2.id)v,(select t3.id,t3.num from t3 where id Hash Join (cost=3.25.1802.67 rows=1 width=19)(actual time=
26、0.265.26.296 rows=99 loops=1)Hash Cond:(t3.id=t1.id)-Seq Scan on t3 (cost=0.00.1799.00 rows=109 width=15)(actual time=0.086.25.853 rows=99 loops=1)Filter:(id Hash (cost=2.00.2.00 rows=100 width=4)(actual time=0.147.0.152 rows=100 loops=1)Buckets:1024 Batches:1 Memory Usage:12kB -Seq Scan on t1 (cost
27、=0.00.2.00 rows=100 width=4)(actual time=0.037.0.079 rows=100 loops=1)-Seq Scan on t2 (cost=0.00.16.00 rows=1000 width=4)(actual time=0.015.0.237 rows=1000 loops=99)Planning Time:0.566 ms Execution Time:71.636 ms(14 rows)SQL改寫加上group by后,子查詢被固化,視圖沒有發生合并postgres=#explain analyze select v.id,v1.numpos
28、tgres-#from(select t1.id from t1,t2 where t1.id=t2.id group by t1.id)v,(select t3.id,t3.num from t3 where id Seq Scan on t3 (cost=0.00.1799.00 rows=109 width=15)(actual time=0.044.20.824 rows=99 loops=1)Filter:(id Hash (cost=26.25.26.25 rows=100 width=4)(actual time=0.839.0.848 rows=100 loops=1)Buck
29、ets:1024 Batches:1 Memory Usage:12kB -HashAggregate (cost=24.25.25.25 rows=100 width=4)(actual time=0.750.0.797 rows=100 loops=1)Group Key:t1.id Batches:1 Memory Usage:24kB -Hash Join (cost=3.25.24.00 rows=100 width=4)(actual time=0.162.0.691 rows=100 loops=1)Hash Cond:(t2.id=t1.id)-Seq Scan on t2 (
30、cost=0.00.16.00 rows=1000 width=4)(actual time=0.024.0.246 rows=1000 loops=1)-Hash (cost=2.00.2.00 rows=100 width=4)(actual time=0.091.0.094 rows=100 loops=1)Buckets:1024 Batches:1 Memory Usage:12kB -Seq Scan on t1 (cost=0.00.2.00 rows=100 width=4)(actual time=0.023.0.050 rows=100 loops=1)Planning T
31、ime:0.637 ms Execution Time:21.918 ms(18 rows)SQL改寫CTE用法SQL改寫非簡單子查詢提取為CTE,包含聚合函數、窗口函數、集合運算。將查詢看成一棵樹,樹的主干是那些被參數篩選的表,通常將主干提取為CTE,其他子查詢或者表需與主干關聯,以減少結果集。分頁應用層面優化。盡量減少使用SELECT*窗口函數row_number=1 row_number exists in數據庫配置更準確的成本預估調整成本因子擴大統計信息采樣范圍,重新搜集統計信息,多列統計信息調整函數易變性,使用預估更準的函數VOLATILESTABLEIMMUTABLE及時收集統計信
32、息手動執行analyze當表數量=geqo_threshold(默認12),用遺傳算法進行最優路徑的篩選工作。提升子查詢關聯等級和指定表連接順序from_collapse_limit=1join_collapse_limit=1GEQO數據庫配置pg_hint_plan插件執行計劃節點開關postgres=#select name,setting from pg_settings where name like enable_%;name|setting-+-enable_async_append|on enable_bitmapscan|on enable_gathermerge|on en
33、able_hashagg|on enable_hashjoin|on enable_incremental_sort|on enable_indexonlyscan|on enable_indexscan|on enable_material|on enable_memoize|on enable_mergejoin|on enable_nestloop|on enable_parallel_append|on enable_parallel_hash|on enable_partition_pruning|on enable_partitionwise_aggregate|off enabl
34、e_partitionwise_join|off enable_seqscan|on enable_sort|on enable_tidscan|on(20 rows)數據庫配置數據庫配置maintenance_work_memautovacuum_*等系列參數shared_bufferswork_memdynamic_shared_memory_typehuge_page架構設計硬件升級連接池讀寫分離緩存分布式架構設計表定義 定期歸檔歷史數據 使用分區表 什么時候使用分區表I.表數據量是否足夠大II.表是否有合適的分區字段III.表內數據是否具有生命周期IV.查詢語句中是否含有分區字段pg_
35、stat_activity:當前有哪些活動會話,每個會話正在執行什么SQL以及狀態。postgres=#select*from pg_stat_activity where datid is not null;-RECORD 1-+-datid|13892datname|postgrespid|31504leader_pid|usesysid|10usename|postgresapplication_name|psqlclient_addr|client_hostname|client_port|-1backend_start|2022-10-13 15:27:25.333653+08xac
36、t_start|2022-10-13 17:52:20.861828+08query_start|2022-10-13 17:52:20.861828+08state_change|2022-10-13 17:52:20.861834+08wait_event_type|wait_event|state|activebackend_xid|backend_xmin|1854query_id|query|select*from pg_stat_activity where datid is not null;backend_type|client backend相關工具pg_stat_statments:記錄每個SQL執行次數,平均/最大/最小執行時間。shared_preload_libraries=pg_stat_statementstrack_io_timing=ontrack_activity_query_size=2048pg_stat_statements.max=10000 pg_stat_statements.track=all pg_stat_statements.track_utility=off pg_stat_statements.save=on 相關工具歡迎關注美創科技新運維新數據公眾號