Monitoring the health of PostgreSQL

If you compare the database administrator to a warrior, his sword is undoubtedly the system shell, while the – monitoring system as well as knowledge and experience related to recording database parameters. In this article I will explain how to determine the condition of the database using parameters that the PostgreSQL engine itself presents to […]

If you compare the database administrator to a warrior, his sword is undoubtedly the system shell, while the – monitoring system as well as knowledge and experience related to recording database parameters. In this article I will explain how to determine the condition of the database using parameters that the PostgreSQL engine itself presents to us.

We have already written about monitoring the PostgreSQL database „from the outside ”, using standard tools available for each sysadmin. We also recently presented tool for monitoring activity in the database. This time we will focus on data that PostgreSQL provides for administrator needs.

Postgression system views

Postgres offers administrators a lot of information in views with names starting with pg_stat_. Using them, we can extract a lot of information about the current state of the database system. Thanks to their analysis, we can easily identify, for example, missing indexes, potentially wrong settings, such as work_mem too low. Below are some of the most important views from the administrator’s point of view.

Table Description
pg_stat_activity process information related to current activity, one line per process
pg_stat_database statistics from the database level in the system, one line per database
pg_stat_user_tables statistics from table level, one row per table

View pg_stat_database

Let’s take a view to the workshop pg_stat_database. Its structure looks like this:

postgres=# \d pg_stat_database
          View "pg_catalog.pg_stat_database"
     Column     |           Type           | Modifiers 
----------------+--------------------------+-----------
 datid          | oid                      | 
 datname        | name                     | 
 numbackends    | integer                  | 
 xact_commit    | bigint                   | 
 xact_rollback  | bigint                   | 
 blks_read      | bigint                   | 
 blks_hit       | bigint                   | 
 tup_returned   | bigint                   | 
 tup_fetched    | bigint                   | 
 tup_inserted   | bigint                   | 
 tup_updated    | bigint                   | 
 tup_deleted    | bigint                   | 
 conflicts      | bigint                   | 
 temp_files     | bigint                   | 
 temp_bytes     | bigint                   | 
 deadlocks      | bigint                   | 
 blk_read_time  | double precision         | 
 blk_write_time | double precision         | 
 stats_reset    | timestamp with time zone |

Here we can find parameters such as:

  • numbackendswho talks about the number of open connections to the database
  • xact_commit and xact_rollbackwhich allows the determination of the ratio of approved and withdrawn transactions
  • blks_hit and blks_read determining the degree of effectiveness of queries to operate on data in the cache
  • tup_*, thanks to which we can determine what is the general model of data access in the database (on the basis of opposition – reading or writing).

In the end we also have blks_read_time and blk_write_time. They answer the question about the time needed to write and read from the disk on a given system. By default they will be reset because enabling the parameter to other than track_io_timing = off could cause a significant mark-up during the operation. Fortunately, the creators of PostgreSQL together with the server provide a tool to determine the impact of operations required when collecting statistics – pg_test_timing. When installing PostgreSQL on the so-called „bare metal ”, i.e. without virtualization, enabling this parameter basically does not require testing. However, when the base works in a virtual machine, it is worth finding out what overhead can cause this option to be enabled. An example of the result of such a test below:

$ pg_test_timing 
Testing timing overhead for 3 seconds.
Per loop time including overhead: 18.16 nsec
Histogram of timing durations:
< usec   % of total      count
     1     98.20300  162242046
     2      1.79523    2965915
     4      0.00053        871
     8      0.00003         51
    16      0.00067       1105
    32      0.00054        899

View pg_stat_user_tables

Going deeper, at the next level of detail we can analyze the content of the view pg_stat_user_tables. Let’s see what data we can read from it:

postgres=# \d pg_stat_user_tables
           View "pg_catalog.pg_stat_user_tables"
       Column        |           Type           | Modifiers 
---------------------+--------------------------+-----------
 relid               | oid                      | 
 schemaname          | name                     | 
 relname             | name                     | 
 seq_scan            | bigint                   | 
 seq_tup_read        | bigint                   | 
 idx_scan            | bigint                   | 
 idx_tup_fetch       | bigint                   | 
 n_tup_ins           | bigint                   | 
 n_tup_upd           | bigint                   | 
 n_tup_del           | bigint                   | 
 n_tup_hot_upd       | bigint                   | 
 n_live_tup          | bigint                   | 
 n_dead_tup          | bigint                   | 
 n_mod_since_analyze | bigint                   | 
 last_vacuum         | timestamp with time zone | 
 last_autovacuum     | timestamp with time zone | 
 last_analyze        | timestamp with time zone | 
 last_autoanalyze    | timestamp with time zone | 
 vacuum_count        | bigint                   | 
 autovacuum_count    | bigint                   | 
 analyze_count       | bigint                   | 
 autoanalyze_count   | bigint                   |

Here you can find the most important information from the point of view of performance, such as the number of queries that had to use sequential scanning (seq_scan) and how many lines are viewed during the scan (seq_tup_read). Then we can also read how many times the index associated with a given table has been used (idx_scan).

Correct analysis of this data is one of the most important DBA skills. During our training in PostgreSQL administration, we try to provide as many examples of the use of collected data as possible. One of the presented applications is searching for potential indexes. For this purpose, we can use the following question:

SELECT 
    schemaname || '.' || relname as table, 
    seq_scan, 
    seq_tup_read, 
    seq_tup_read / seq_scan as avg_tup,
    idx_scan,
    idx_scan / seq_scan as ratio
FROM pg_stat_user_tables 
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 25;

The result of this query will be a list of tables for which seq_tup_read has the highest values, which indicates that the most common operation on them is sequential scanning. It is worth recalling that sequential scanning is not bad in itself. Appears, for example, when performing backups. However, if it appears too often, performance will definitely decrease.

Analyzing PostgreSQL logs

Logs are also an irreplaceable mine of information about the state of the system. However, proper system configuration requires DBA steps to be taken, because the information collected by default may not be sufficient to draw conclusions about the operation of the system.

Collecting logs and analyzing them is a broad topic. In the past we wrote about one of login standards – syslog that allows you to centralize login from many different sites. PostgreSQL provides considerable flexibility in login, so you can „connect ” with existing solutions in the organization.

The most important configuration parameters related to logging in PostgreSQL are:

  • log_destination– indicating the destination to which the logs are to be sent. The default value is stderr
  • logging_collector– option that allows you to collect data for files by Postgres himself. Default set to off
  • log_directory– directory to which log files will be deposited
  • log_filename – file name, which in combination with the option log_truncate_on_rotation = on allows you to create „policy ” information retention – popular setting is postgresql_%a.logwhich will add the day of the week to the file name, thus creating a 7-day data storage window.

You can read more about them in official documentation or use search engine settings which makes it easy to find and customize the configuration file postgresql.conf to your own needs. Here, however, I would like to draw attention to one more setting, which is particularly important when logging into a file. Namely log_line_prefixwhich changes the default value depending on the version.

Default setting PostgreSQL version
log_line_prefix = '%m [%p] ' 12,11,10
log_line_prefix = '' 9.6 and earlier

This is due to the fact that many administrators often used the default settings according to the philosophy that creators will know best, which in this case often resulted in the inability to determine the cause of the problem when it already occurred.

Current behavior collects information about the current time to the nearest milliseconds (%m) and process ID (%p) to which the event can be assigned.

From the DBA point of view, parameters that allow you to define what is really to be logged in are also important. We have information about checkpoints (log_checkpoints), database connections (log_connectionslog_disconnections) or information about long-term locks on the tables (log_lock_waitsdeadlocks_timeout). The parameter also determines the detail of the log log_statementwhich allows you to specify whether other additional information (except for error information) is to be added to the logs. We have values at our disposal:

  • ddl – adding information about all commands CREATE or ALTER
  • mod– expanding information on data changes
  • all– all information will be added to the log.

To learn about the operation of an application using the database, we are often forced to increase the detail of the log, paying for it with a decrease in performance. Fortunately, we can reduce this cost by limiting ourselves to slower queries.

Login of slow queries in PostgreSQL

In order to filter from the quick queries log, which will allow you to focus only on the problematic ones, we can use the parameter log_min_duration_statement. The unit in this case is milliseconds. Therefore, if we wanted to collect information about queries that lasted more than 0.5 seconds, the parameter would have to take the following appearance:

log_min_duration_statement = 500

Summary

Monitoring the health of the database is a river topic. One could also mention proper hygiene during the use of the base and standard maintenance work related to it. For those interested, I recommend our training in PostgreSQL administration.

blank Authors

The blog articles are written by people from the EuroLinux team. We owe 80% of the content to our developers, the rest is prepared by the sales or marketing department. We make every effort to ensure that the content is the best in terms of content and language, but we are not infallible. If you see anything that needs to be corrected or clarified, we'd love to hear from you.