Monitoring the health of PostgreSQL
If you compare the database administrator to a warrior, his sword is undoubtedly the system shell, while the – monitoring system as well as knowledge and experience related to recording database parameters. In this article I will explain how to determine the condition of the database using parameters that the PostgreSQL engine itself presents to […]
If you compare the database administrator to a warrior, his sword is undoubtedly the system shell, while the – monitoring system as well as knowledge and experience related to recording database parameters. In this article I will explain how to determine the condition of the database using parameters that the PostgreSQL engine itself presents to us.
We have already written about monitoring the PostgreSQL database „from the outside ”, using standard tools available for each sysadmin. We also recently presented tool for monitoring activity in the database. This time we will focus on data that PostgreSQL provides for administrator needs.
Postgression system views
Postgres offers administrators a lot of information in views with names starting with pg_stat_
. Using them, we can extract a lot of information about the current state of the database system. Thanks to their analysis, we can easily identify, for example, missing indexes, potentially wrong settings, such as work_mem
too low. Below are some of the most important views from the administrator’s point of view.
Table | Description |
---|---|
pg_stat_activity |
process information related to current activity, one line per process |
pg_stat_database |
statistics from the database level in the system, one line per database |
pg_stat_user_tables |
statistics from table level, one row per table |
View pg_stat_database
Let’s take a view to the workshop pg_stat_database
. Its structure looks like this:
postgres=# \d pg_stat_database View "pg_catalog.pg_stat_database" Column | Type | Modifiers ----------------+--------------------------+----------- datid | oid | datname | name | numbackends | integer | xact_commit | bigint | xact_rollback | bigint | blks_read | bigint | blks_hit | bigint | tup_returned | bigint | tup_fetched | bigint | tup_inserted | bigint | tup_updated | bigint | tup_deleted | bigint | conflicts | bigint | temp_files | bigint | temp_bytes | bigint | deadlocks | bigint | blk_read_time | double precision | blk_write_time | double precision | stats_reset | timestamp with time zone |
Here we can find parameters such as:
numbackends
who talks about the number of open connections to the databasexact_commit
andxact_rollback
which allows the determination of the ratio of approved and withdrawn transactionsblks_hit
andblks_read
determining the degree of effectiveness of queries to operate on data in the cachetup_*
, thanks to which we can determine what is the general model of data access in the database (on the basis of opposition – reading or writing).
In the end we also have blks_read_time
and blk_write_time
. They answer the question about the time needed to write and read from the disk on a given system. By default they will be reset because enabling the parameter to other than track_io_timing = off
could cause a significant mark-up during the operation. Fortunately, the creators of PostgreSQL together with the server provide a tool to determine the impact of operations required when collecting statistics – pg_test_timing
. When installing PostgreSQL on the so-called „bare metal ”, i.e. without virtualization, enabling this parameter basically does not require testing. However, when the base works in a virtual machine, it is worth finding out what overhead can cause this option to be enabled. An example of the result of such a test below:
$ pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 18.16 nsec Histogram of timing durations: < usec % of total count 1 98.20300 162242046 2 1.79523 2965915 4 0.00053 871 8 0.00003 51 16 0.00067 1105 32 0.00054 899
View pg_stat_user_tables
Going deeper, at the next level of detail we can analyze the content of the view pg_stat_user_tables
. Let’s see what data we can read from it:
postgres=# \d pg_stat_user_tables View "pg_catalog.pg_stat_user_tables" Column | Type | Modifiers ---------------------+--------------------------+----------- relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint | n_tup_hot_upd | bigint | n_live_tup | bigint | n_dead_tup | bigint | n_mod_since_analyze | bigint | last_vacuum | timestamp with time zone | last_autovacuum | timestamp with time zone | last_analyze | timestamp with time zone | last_autoanalyze | timestamp with time zone | vacuum_count | bigint | autovacuum_count | bigint | analyze_count | bigint | autoanalyze_count | bigint |
Here you can find the most important information from the point of view of performance, such as the number of queries that had to use sequential scanning (seq_scan
) and how many lines are viewed during the scan (seq_tup_read
). Then we can also read how many times the index associated with a given table has been used (idx_scan
).
Correct analysis of this data is one of the most important DBA skills. During our training in PostgreSQL administration, we try to provide as many examples of the use of collected data as possible. One of the presented applications is searching for potential indexes. For this purpose, we can use the following question:
SELECT schemaname || '.' || relname as table, seq_scan, seq_tup_read, seq_tup_read / seq_scan as avg_tup, idx_scan, idx_scan / seq_scan as ratio FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 25;
The result of this query will be a list of tables for which seq_tup_read
has the highest values, which indicates that the most common operation on them is sequential scanning. It is worth recalling that sequential scanning is not bad in itself. Appears, for example, when performing backups. However, if it appears too often, performance will definitely decrease.
Analyzing PostgreSQL logs
Logs are also an irreplaceable mine of information about the state of the system. However, proper system configuration requires DBA steps to be taken, because the information collected by default may not be sufficient to draw conclusions about the operation of the system.
Collecting logs and analyzing them is a broad topic. In the past we wrote about one of login standards – syslog that allows you to centralize login from many different sites. PostgreSQL provides considerable flexibility in login, so you can „connect ” with existing solutions in the organization.
The most important configuration parameters related to logging in PostgreSQL are:
log_destination
– indicating the destination to which the logs are to be sent. The default value isstderr
logging_collector
– option that allows you to collect data for files by Postgres himself. Default set tooff
log_directory
– directory to which log files will be depositedlog_filename
– file name, which in combination with the optionlog_truncate_on_rotation = on
allows you to create „policy ” information retention – popular setting ispostgresql_%a.log
which will add the day of the week to the file name, thus creating a 7-day data storage window.
You can read more about them in official documentation or use search engine settings which makes it easy to find and customize the configuration file postgresql.conf
to your own needs. Here, however, I would like to draw attention to one more setting, which is particularly important when logging into a file. Namely log_line_prefix
which changes the default value depending on the version.
Default setting | PostgreSQL version |
---|---|
log_line_prefix = '%m [%p] ' |
12,11,10 |
log_line_prefix = '' |
9.6 and earlier |
This is due to the fact that many administrators often used the default settings according to the philosophy that creators will know best, which in this case often resulted in the inability to determine the cause of the problem when it already occurred.
Current behavior collects information about the current time to the nearest milliseconds (%m
) and process ID (%p
) to which the event can be assigned.
From the DBA point of view, parameters that allow you to define what is really to be logged in are also important. We have information about checkpoints (log_checkpoints
), database connections (log_connections
, log_disconnections
) or information about long-term locks on the tables (log_lock_waits
, deadlocks_timeout
). The parameter also determines the detail of the log log_statement
which allows you to specify whether other additional information (except for error information) is to be added to the logs. We have values at our disposal:
ddl
– adding information about all commandsCREATE
orALTER
mod
– expanding information on data changesall
– all information will be added to the log.
To learn about the operation of an application using the database, we are often forced to increase the detail of the log, paying for it with a decrease in performance. Fortunately, we can reduce this cost by limiting ourselves to slower queries.
Login of slow queries in PostgreSQL
In order to filter from the quick queries log, which will allow you to focus only on the problematic ones, we can use the parameter log_min_duration_statement
. The unit in this case is milliseconds. Therefore, if we wanted to collect information about queries that lasted more than 0.5 seconds, the parameter would have to take the following appearance:
log_min_duration_statement = 500
Summary
Monitoring the health of the database is a river topic. One could also mention proper hygiene during the use of the base and standard maintenance work related to it. For those interested, I recommend our training in PostgreSQL administration.