
PostgreSQL 13 – lucky number in the world of Postgres

Many people suffer from the irrational fear of 13, which is professionally referred to as triskaidecaphobia. In the case of the latest PostgreSQL release, bearing this number, however, the concerns are unfounded. Almost everyone will find changes that improve Postgres’s everyday aspects. So let’s try „conjure up ” apparent unluckyness of the 13th PostgreSQL release. […]
Many people suffer from the irrational fear of 13, which is professionally referred to as triskaidecaphobia. In the case of the latest PostgreSQL release, bearing this number, however, the concerns are unfounded. Almost everyone will find changes that improve Postgres’s everyday aspects.
So let’s try „conjure up ” apparent unluckyness of the 13th PostgreSQL release. The first conspicuous thing is the enormity of work put into this edition. There are probably no subsystems with many production implementations, which this edition would not somehow refine.
The last few major releases are major changes in the PostgreSQL world. Each of them brought a lot of changes and new functions such as partitioning, native logical replication whether JSON support. In the case of this issue, however, things are different and we will receive patches for existing functions. Which does not mean that we have nothing to get excited about!
Smaller indexes
The first improvement that will definitely be useful to all Postgres users is the change in the size of standard indexes based on B-trees (B-tree). This type of index is most often used because it will be created after executing the command CREATE INDEX
. Reducing its size is definitely pleasing, especially since along with the smaller size you can also expect faster access to information stored in it. The amendment in index size mainly refers to changes in the detection of duplicate values (this can be read in the implementation documentation B-trees in Postgres).
What does it look like in practice? Using the „toy ” example below, we can show how a change between major versions can affect the occupancy of production drives.
--- sample table CREATE TABLE index_test(id int PRIMARY KEY, value int); CREATE INDEX ON index_test (value); --- inserting a data INSERT INTO index_test SELECT i, i % 12345 FROM generate_series(1,1000000) i;
For version 12, the size of the index with duplicate data is as follows:
--- results for version 12 postgres=# select pg_size_pretty(pg_relation_size('index_test_pkey')); pg_size_pretty ---------------- 21 MB (1 row)
However, for version 13 we see a nearly 60% reduction in size:
postgres=# select pg_size_pretty(pg_relation_size('index_test_value_idx')); pg_size_pretty ---------------- 8664 kB (1 row)
What does the performance comparison look like? It is worth noting that creating correct tests of this type is a non-trivial task. Performance depends on a huge number of factors, not necessarily orthogonal or even linear relationships. Below are our results, rather as anecdata than hard data;)
We used the following query:
EXPLAIN ANALYZE SELECT count(*) FROM index_test WHERE value = 42;
Below is the query plan for version 12:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=297.39..297.40 rows=1 width=8) (actual time=0.433..0.434 rows=1 loops=1) -> Bitmap Heap Scan on index_test (cost=5.05..297.19 rows=81 width=0) (actual time=0.118..0.394 rows=81 loops=1) Recheck Cond: (value = 42) Heap Blocks: exact=81 -> Bitmap Index Scan on index_test_value_idx (cost=0.00..5.03 rows=81 width=0) (actual time=0.036..0.036 rows=81 loops=1) Index Cond: (value = 42) Planning Time: 0.490 ms Execution Time: 0.509 ms
And for version 13:
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- ------ Aggregate (cost=6.04..6.05 rows=1 width=8) (actual time=0.059..0.060 rows=1 loops=1) -> Index Only Scan using index_test_value_idx on index_test (cost=0.42..5.84 rows=81 width=0) (actual time=0.046..0.052 rows=81 loops=1) Index Cond: (value = 42) Heap Fetches: 0 Planning Time: 0.237 ms Execution Time: 0.082 ms
As you can see, even such basic database elements as indexes can benefit a lot from making appropriate corrections by the Open Source community. If you upgrade from previous versions, you will need to rebuild the index to take advantage of this change. Fortunately, previous ones 12th PostgreSQL release added the option to use the option REINDEX CONCURRENTLY
. So you don’t have to prepare the service window by blocking the option REINDEX
.
New parameters to facilitate monitoring
Monitoring the operation of the database is one of the basic administrative activities. One of the uncollected sources of information are system logs that allow you to determine the health status of the database and indicate where to direct your attention in order to improve its performance. One of the tools for log analysis is LogAnalyzer.
To make the most of the analyzer’s capabilities, you should save all events in the database. This is obviously not possible in practice for production systems, because the highest level of detail is very aggravating and reduces performance. Until version 12, administrators had to use the parameter log_min_duration_statement
causing the query to be saved in the log if its execution time was longer than the limit value. This helped identify individual long queries. However, it was quite obvious to focus only on a fragment of reality, not giving an view on database system as a whole.
In version 12 a parameter appeared log_transaction_sample_rate
so you can specify what percentage of all queries will be saved in the log. This allows you to mythize the problem of performance reduction, while giving you an overview of the status of the entire database. PostgreSQL 13 develops this idea and allows for more accurate login control thanks to new configuration parameters. Those are:
log_min_duration_sample
– adding the ability to log in slower queries just likelog_min_duration_statement
log_statement_sample_rate
– controlling sampling for the above parameter.
Choosing the right values for each application using Postgres goes far beyond the scope of this article. It is worth mentioning, however, that probably only practical tests allow you to find optimal settings.
Summary
Congratulations to the creators of the next good step, and we encourage users to update. PostgreSQL version 13 was officially released on September 24, 2020. Detailed information about the release can be found at the link: https://www.postgresql.org/docs/13/release-13.html