Deprecated: Function Yoast\WP\SEO\Conditionals\Schema_Blocks_Conditional::get_feature_flag is deprecated since version Yoast SEO 20.5 with no alternative available. in /home1/minerho3/public_html/wp-includes/functions.php on line 6078

Deprecated: Function Yoast\WP\SEO\Conditionals\Schema_Blocks_Conditional::get_feature_flag is deprecated since version Yoast SEO 20.5 with no alternative available. in /home1/minerho3/public_html/wp-includes/functions.php on line 6078

Deprecated: Function Yoast\WP\SEO\Conditionals\Schema_Blocks_Conditional::get_feature_flag is deprecated since version Yoast SEO 20.5 with no alternative available. in /home1/minerho3/public_html/wp-includes/functions.php on line 6078

Warning: Cannot modify header information - headers already sent by (output started at /home1/minerho3/public_html/wp-includes/functions.php:6078) in /home1/minerho3/public_html/wp-includes/feed-rss2.php on line 8
MySQL Consulting https://minervadb.com/index.php/tag/postgresql/ Committed to Building Optimal, Scalable, Highly Available, Fault-Tolerant, Reliable and Secured WebScale Database Infrastructure Operations Mon, 24 Aug 2020 09:21:47 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://minervadb.com/wp-content/uploads/2017/10/cropped-LogoColorTextRight-32x32.jpeg MySQL Consulting https://minervadb.com/index.php/tag/postgresql/ 32 32 Troubleshooting PostgreSQL Performance from Slow Queries https://minervadb.com/index.php/2020/08/23/troubleshooting-postgresql-performance-from-slow-queries/ Sun, 23 Aug 2020 11:05:33 +0000 http://minervadb.com/?p=4351 PostgreSQL Performance Troubleshooting with Slow Queries Introduction   If you are doing a very detailed Performance Diagnostics / Forensics then we strongly recommend you to understand the Data Access Path of underlying queries, cost of [...]

The post Troubleshooting PostgreSQL Performance from Slow Queries appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
PostgreSQL Performance Troubleshooting with Slow Queries

Introduction  

If you are doing a very detailed Performance Diagnostics / Forensics then we strongly recommend you to understand the Data Access Path of underlying queries, cost of query execution, wait events / locks and system resource usage by PostgreSQL infrastructure operations. MinervaDB Performance Engineering Team measures performance by “Response Time” , So finding slow queries in PostgreSQL will be the most appropriate point to start this blog. PostgreSQL Server is highly configurable to collect details on query performance: slow query log, auditing execution plans with auto_explain and querying pg_stat_statements  . 

Using PostgreSQL slow query log to troubleshoot the performance

Step 1 – Open postgresql.conf file in your favorite text editor ( In Ubuntu, postgreaql.conf is available on /etc/postgresql/ ) and update configuration parameter log_min_duration_statement , By default configuration the slow query log is not active, To enable the slow query log on globally, you can change postgresql.conf:

log_min_duration_statement = 2000

In the above configuration, PostgreSQL will log queries, which take longer than 2 seconds.

Step 2 – A “reload” (by simply calling the SQL function) is sufficient, there is no need for a PostgreSQL server restart and Don’t worry, it won’t interrupt any active connections:

postgres=# SELECT pg_reload_conf();
 pg_reload_conf 
----------------
 t 
(1 row)

Note: It’s often too heavy for PostgreSQL infrastructure if you change slow query log settings in postgresql.conf , Therefore it makes more sensible to change only for a selected database or user:

postgres=# ALTER DATABASE minervadb SET log_min_duration_statement = 2000;
ALTER DATABASE

To complete the detailed performance forensics / diagnostics of high latency queries you can use aut0_explain , We have explained same below for queries exceeding certain threshold in PostgreSQL to send plan to the log file:

postgres=# LOAD 'auto_explain';
LOAD
postgres=# SET auto_explain.log_analyze TO on;
SET
postgres=# SET auto_explain.log_min_duration TO 2000;
SET

You can also enable auto explain in postgresql.conf with the settings below:

session_preload_libraries = 'auto_explain';

Note: Please do not forget to call pg_reload_conf() after the change made to postgresql.conf

More examples on PostgreSQL auto explain is copied below:

postgres=# CREATE TABLE minervdb_bench  AS
postgres-# SELECT * FROM generate_series(1, 10000000) AS id;
SELECT 10000000
postgres=# CREATE INDEX idx_id ON minervadb_bench(id);

postgres=# CREATE INDEX idx_id ON minervdb_bench(id);
CREATE INDEX
postgres=# ANALYZE;
ANALYZE

postgres=# LOAD 'auto_explain';
LOAD
postgres=# SET auto_explain.log_analyze TO on;
SET
postgres=# SET auto_explain.log_min_duration TO 200;
SET

postgres=# explain SELECT * FROM minervdb_bench  WHERE id < 5000;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Index Only Scan using idx_id on minervdb_bench  (cost=0.43..159.25 rows=4732 width=4)
   Index Cond: (id < 5000)
(2 rows)

postgres=# explain SELECT * FROM minervdb_bench  WHERE id < 200000;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Index Only Scan using idx_id on minervdb_bench  (cost=0.43..6550.25 rows=198961 width=4)
   Index Cond: (id < 200000)
(2 rows)

postgres=# explain SELECT count(*) FROM minervdb_bench GROUP BY id % 2;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 GroupAggregate  (cost=1605360.71..1805360.25 rows=9999977 width=12)
   Group Key: ((id % 2))
   ->  Sort  (cost=1605360.71..1630360.65 rows=9999977 width=4)
         Sort Key: ((id % 2))
         ->  Seq Scan on minervdb_bench  (cost=0.00..169247.71 rows=9999977 width=4)
 JIT:
   Functions: 6
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(8 rows)

Using pg_stat_statements

We can use pg_stat_statements to group the identical PostgreSQL queries by latency, To enable pg_stat_statements you have to add the following line to postgresql.conf and restart PostgreSQL server:

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

pg_stat_statements.max = 10000
pg_stat_statements.track = all

Run “CREATE EXTENSION pg_stat_statements” in your database so that PostgreSQL will create a view for you:

postgres=# CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION

postgres=# \d pg_stat_statements
                    View "public.pg_stat_statements"
       Column        |       Type       | Collation | Nullable | Default 
---------------------+------------------+-----------+----------+---------
 userid              | oid              |           |          | 
 dbid                | oid              |           |          | 
 queryid             | bigint           |           |          | 
 query               | text             |           |          | 
 calls               | bigint           |           |          | 
 total_time          | double precision |           |          | 
 min_time            | double precision |           |          | 
 max_time            | double precision |           |          | 
 mean_time           | double precision |           |          | 
 stddev_time         | double precision |           |          | 
 rows                | bigint           |           |          | 
 shared_blks_hit     | bigint           |           |          | 
 shared_blks_read    | bigint           |           |          | 
 shared_blks_dirtied | bigint           |           |          | 
 shared_blks_written | bigint           |           |          | 
 local_blks_hit      | bigint           |           |          | 
 local_blks_read     | bigint           |           |          | 
 local_blks_dirtied  | bigint           |           |          | 
 local_blks_written  | bigint           |           |          | 
 temp_blks_read      | bigint           |           |          | 
 temp_blks_written   | bigint           |           |          | 
 blk_read_time       | double precision |           |          | 
 blk_write_time      | double precision |           |          | 

postgres=#

pg_stat_statements view columns explained (Source: https://www.postgresql.org/docs/12/pgstatstatements.html)

Name Type References Description
userid oid pg_authid.oid OID of user who executed the statement
dbid oid pg_database.oid OID of database in which the statement was executed
queryid bigint Internal hash code, computed from the statement’s parse tree
query text Text of a representative statement
calls bigint Number of times executed
total_time double precision Total time spent in the statement, in milliseconds
min_time double precision Minimum time spent in the statement, in milliseconds
max_time double precision Maximum time spent in the statement, in milliseconds
mean_time double precision Mean time spent in the statement, in milliseconds
stddev_time double precision Population standard deviation of time spent in the statement, in milliseconds
rows bigint Total number of rows retrieved or affected by the statement
shared_blks_hit bigint Total number of shared block cache hits by the statement
shared_blks_read bigint Total number of shared blocks read by the statement
shared_blks_dirtied bigint Total number of shared blocks dirtied by the statement
shared_blks_written bigint Total number of shared blocks written by the statement
local_blks_hit bigint Total number of local block cache hits by the statement
local_blks_read bigint Total number of local blocks read by the statement
local_blks_dirtied bigint Total number of local blocks dirtied by the statement
local_blks_written bigint Total number of local blocks written by the statement
temp_blks_read bigint Total number of temp blocks read by the statement
temp_blks_written bigint Total number of temp blocks written by the statement
blk_read_time double precision Total time the statement spent reading blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)
blk_write_time double precision Total time the statement spent writing blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)

You can list queries by latency / Response Time in PostgreSQL  by querying pg_stat_statements:

postgres=# \x
Expanded display is on.

select query,calls,total_time,min_time,max_time,mean_time,stddev_time,rows from pg_stat_statements order by mean_time desc;

-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query       | SELECT count(*) FROM minervdb_bench GROUP BY id % $1
calls       | 6
total_time  | 33010.533078
min_time    | 4197.876021
max_time    | 6485.33594
mean_time   | 5501.755512999999
stddev_time | 826.3716429081501
rows        | 72
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query       | CREATE INDEX idx_id ON minervdb_bench(id)
calls       | 1
total_time  | 4560.808456
min_time    | 4560.808456
max_time    | 4560.808456
mean_time   | 4560.808456
stddev_time | 0
rows        | 0
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query       | ANALYZE
calls       | 1
total_time  | 441.725223
min_time    | 441.725223
max_time    | 441.725223
mean_time   | 441.725223
stddev_time | 0
rows        | 0

-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------
query       | SELECT a.attname,                                                                                                                                                         
                                                                                                                                                +
            |   pg_catalog.format_type(a.atttypid, a.atttypmod),                                                                                                                        
                                                                                                                                                +
            |   (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid, $1) for $2)                                                                                                                                                                                                                                                +
            |    FROM pg_catalog.pg_attrdef d                                                                                                                                                                                                                                                                                           +
            |    WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),                                                                                                                                                                                                                                                  +
            |   a.attnotnull,                                                                                                                                                                                                                                                                                                           +
            |   (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t                                                                                                                                                                                                                                                 +
            |    WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation,                                                                                                                                                                                                             +
            |   a.attidentity,                                                                                                                                                                                                                                                                                                          +
            |   a.attgenerated                                                                                                                                                                                                                                                                                                          +
            | FROM pg_catalog.pg_attribute a                                                                                                                                                                                                                                                                                            +
            | WHERE a.attrelid = $3 AND a.attnum > $4 AND NOT a.attisdropped                                                                                                                                                                                                                                                            +
            | ORDER BY a.attnum
calls       | 4
total_time  | 1.053107
min_time    | 0.081565
max_time    | 0.721785
mean_time   | 0.26327675000000006
stddev_time | 0.2658756938743884
rows        | 86

If you already know the epicenter of the bottleneck is a particular query or event / time, you can reset statistics just before query / event to monitor the problematic components in the PostgreSQL performance, You can do that by just calling pg_stat_statements_reset() as copied below:

postgres= SELECT pg_stat_statements_reset();

Conclusion

Performance tuning is the process of optimizing PostgreSQL performance by streamlining the execution of multiple SQL statements. In other words, performance tuning simplifies the process of accessing and altering information contained by the database with the intention of improving query response times and database application operations.

 

The post Troubleshooting PostgreSQL Performance from Slow Queries appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
MinervaDB Webinar: PostgreSQL Internals and Performance Optimization https://minervadb.com/index.php/2020/07/10/minervadb-webinar-postgresql-internals-and-performance-optimization/ Fri, 10 Jul 2020 01:12:04 +0000 http://minervadb.com/?p=4205 MinervaDB Webinar: PostgreSQL Internals and Performance Optimization Our founder and Principal Shiv Iyer did a webinar (July 09, 2020) on PostgreSQL Internals and Performance Optimization  , Shiv is a longtime Open Source Database Systems Operations expert [...]

The post MinervaDB Webinar: PostgreSQL Internals and Performance Optimization appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
MinervaDB Webinar: PostgreSQL Internals and Performance Optimization

Our founder and Principal Shiv Iyer did a webinar (July 09, 2020) on PostgreSQL Internals and Performance Optimization  , Shiv is a longtime Open Source Database Systems Operations expert with core expertise on performance optimization, capacity planning / sizing, architecture / internals, transaction processing engineering, horizontal scalability & partitioning, storage optimization, distributed database systems and data compression algorithms. The core objective of this webinar was to talk about PostgreSQL internals, troubleshooting PostgreSQL query performance, index optimization, partitioning, PostgreSQL configuration parameters and best practices. We strongly believe that understanding PostgreSQL architecture and internals are very important to troubleshoot PostgreSQL performance proactively and efficiently, You can download the PDF copy of the webinar here , If you want the recorded video of the webinar please contact support@minervadb.com .

 



Contact MinervaDB for Enterprise-Class PostgreSQL Consulting and 24*7 Consultative Support




 

 

 

The post MinervaDB Webinar: PostgreSQL Internals and Performance Optimization appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
PostgreSQL Internals and Performance Troubleshooting Webinar https://minervadb.com/index.php/2020/07/02/postgresql-internals-and-performance-troubleshooting-webinar/ Thu, 02 Jul 2020 07:41:28 +0000 http://minervadb.com/?p=4186 PostgreSQL Internals and Performance Troubleshooting Webinar from Shiv Iyer MinervaDB provides full-stack PostgreSQL consulting, support and managed Remote DBA Services for several customers globally addressing performance, scalability, high availability and database reliability engineering. Our prospective [...]

The post PostgreSQL Internals and Performance Troubleshooting Webinar appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
PostgreSQL Internals and Performance Troubleshooting Webinar from Shiv Iyer

MinervaDB provides full-stack PostgreSQL consulting, support and managed Remote DBA Services for several customers globally addressing performance, scalability, high availability and database reliability engineering. Our prospective customers and other fellow DBAs are often curious about how we at MinervaDB troubleshoot PostgreSQL performance, So we thought will share the same through a webinar on “PostgreSQL Internals and Performance Troubleshooting“. This webinar is hosted by Shiv Iyer ( Founder and Principal of MinervaDB ), a longtime Open Source Database Systems Operations expert with core expertise on performance optimization, capacity planning / sizing, architecture / internals, transaction processing engineering, horizontal scalability & partitioning, storage optimization, distributed database systems and data compression algorithms. You can register for the webinar here

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



Contact MinervaDB for Enterprise-Class PostgreSQL Consulting and 24*7 Consultative Support 



The post PostgreSQL Internals and Performance Troubleshooting Webinar appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
When PostgreSQL vacuum won’t remove dead rows from a Table  https://minervadb.com/index.php/2020/04/29/when-postgresql-vacuum-wont-remove-dead-rows-from-a-table/ Wed, 29 Apr 2020 20:43:07 +0000 http://minervadb.com/?p=3684 When PostgreSQL vacuum won’t remove dead rows from a Table ? What is VACUUM in PostgreSQL ? In PostgreSQL, whenever rows in a table deleted, The existing row or tuple is marked as dead ( [...]

The post When PostgreSQL vacuum won’t remove dead rows from a Table  appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
When PostgreSQL vacuum won’t remove dead rows from a Table ?

What is VACUUM in PostgreSQL ?

In PostgreSQL, whenever rows in a table deleted, The existing row or tuple is marked as dead ( will not be physically removed) and during an update, it marks corresponding exiting tuple as dead and inserts a new tuple so in PostgreSQL UPDATE operations = DELETE + INSERT. These dead tuples consumes unnecessary storage and eventually you have a bloated PostgreSQL database, This is a serious issue to solve for a PostgreSQL DBA. VACUUM reclaims in the storage occupied by dead tuples, Please keep this noted that reclaimed storage space is never given back to the resident operating system rather they are just defragmented within the same database page and so storage for reuse to future data inserts in the same table. Does the pain stops here ? No, It doesn’t. Bloat seriously affect the PostgreSQL query performance, In PostgreSQL tables and indexes are stored as array of fixed-size pages ( usually 8KB in size). Whenever a query requests for rows, the PostgreSQL instance loads these pages into the memory and dead rows causes expensive disk I/O during data loading.

How to monitor autovacuum has processed bloated tables ?

If you are suspecting bloated tables in your PostgreSQL infrastructure, The  first thing to check is vacuum has processed those bloated table. We use following script to collect last processed vacuum data:

SELECT schemaname, relname, n_live_tup, n_dead_tup, last_autovacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup
    / (n_live_tup
       * current_setting('autovacuum_vacuum_scale_factor')::float8
          + current_setting('autovacuum_vacuum_threshold')::float8)
     DESC
LIMIT 10;

But sometimes you can see vacuum is run recently and still didn’t free the dead tuples:

 schemaname |    relname   | n_live_tup | n_dead_tup |   last_autovacuum
-----------+--------------+------------+------------+---------------------
 ad_ops_dB  | ad_clicks    |      96100 |      96100 | 2020-04-18 16:33:47
 pg_catalog | pg_attribute |         11 |        259 |
 pg_catalog | pg_amop      |        193 |        81  |
 pg_catalog | pg_class     |         61 |         29 |
 pg_catalog | pg_type      |         39 |         14 |
 pg_catalog | pg_index     |          8 |         21 |
 pg_catalog | pg_depend    |       7349 |        962 |
 pg_catalog | pg_trigger   |          6 |         37 |
 pg_catalog | pg_proc      |        469 |         87 |
 pg_catalog | pg_shdepend  |         18 |         11 |
(10 rows)

So it’s evident now that sometimes PostgreSQL vacuum is not removing dead rows.

When PostgreSQL Vacuum won’t remove the dead rows ?

Transactions in PostgreSQL are identified with xid  ( transaction or “xact” ID), PostgreSQL will assign a transaction xid only if it starts modifying the data because it’s only from that point where other process need to start tracking it changes, These are not applicable for read only transactions.

We have  copied  below the PostgreSQL data structure ( from proc.c ) that handle transactions:

typedef struct PGXACT
{
    TransactionId xid;   /* id of top-level transaction currently being
                          * executed by this proc, if running and XID
                          * is assigned; else InvalidTransactionId */

    TransactionId xmin;  /* minimal running XID as it was when we were
                          * starting our xact, excluding LAZY VACUUM:
                          * vacuum must not remove tuples deleted by
                          * xid >= xmin ! */

    ...
} PGXACT;

PostgreSQL Vacuum removes only the dead rows that are not in use anymore. A tuple is considered not needed when transaction ID of the deleting transaction is older than oldest transaction which is still active in the PostgreSQL database. The vacuum processes calculate the minimum boundary of data that they need to retain by tracking the minimum of xmins of all active transactions.

The following are three situations which holds back xmin horizon in a PostgreSQL infrastructure:

1. Long running transactions

You can find the details of long running queries and their respective xmin values form the query copied below:

SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;

P.S. If you think those transactions are no longer required, Please use pg_terminate_backend()  to terminate PostgreSQL sessions blocking Vacuum processes

2. Abandoned replication slots

In PostgreSQL a replication slot is a data structure to control PostgreSQL from deleting the data that are still required by a standby server to catch-up with the primary database instance. If ever the replication to a  standby server / slave is delayed or slave PostgreSQL instance goes down for longer duration then replication slot will prevent vacuum from deleting old records / rows. To monitor replication slots and their relative xmin value please use the query below

SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;

P.S. – To drop replication slots that are no longer needed, please use the function pg_drop_replication_slot( )

3. Orphaned prepared transactions

In a two-phase commit PostgreSQL database infrastructure, a distributed transaction is first prepared using PREPARE statement and later committed with COMMIT PREPARED statement. To monitor all prepared transactions and their respective xmin value please run the query below:

SELECT gid, prepared, owner, database, transaction AS xmin
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;

P.S. – We recommend ROLLBACK PREPARED SQL statement to remove prepared transactions

Conclusion

PostgreSQL Autovacuum addresses table bloating efficiently but there are situations where vacuum does’t work as expected so we strongly recommend to regularly check on how vacuum is processing the bloated tables

Reference links

The post When PostgreSQL vacuum won’t remove dead rows from a Table  appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
How PostgreSQL Autovacuum works ? https://minervadb.com/index.php/2020/04/27/how-postgresql-autovacuum-works/ https://minervadb.com/index.php/2020/04/27/how-postgresql-autovacuum-works/#comments Mon, 27 Apr 2020 12:21:47 +0000 http://minervadb.com/?p=3599 Understanding PostgreSQL Autovacuum for Performance and Reliability – Troubleshooting PostgreSQL Performance What is VACUUM in PostgreSQL ? In PostgreSQL, whenever rows in a table deleted, The existing row or tuple is marked as dead ( [...]

The post How PostgreSQL Autovacuum works ? appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
Understanding PostgreSQL Autovacuum for Performance and Reliability – Troubleshooting PostgreSQL Performance

What is VACUUM in PostgreSQL ?

In PostgreSQL, whenever rows in a table deleted, The existing row or tuple is marked as dead ( will not be physically removed) and during an update, it marks corresponding exiting tuple as dead and inserts a new tuple so in PostgreSQL UPDATE operations = DELETE + INSERT. These dead tuples consumes unnecessary storage and eventually you have a bloated PostgreSQL database, This is a serious issue to solve for a PostgreSQL DBA. VACUUM reclaims in the storage occupied by dead tuples, Please keep this noted that reclaimed storage space is never given back to the resident operating system rather they are just defragmented within the same database page and so storage for reuse to future data inserts in the same table. Does the pain stops here ? No, It doesn’t. Bloat seriously affect the PostgreSQL query performance, In PostgreSQL tables and indexes are stored as array of fixed-size pages ( usually 8KB in size). Whenever a query requests for rows, the PostgreSQL instance loads these pages into the memory and dead rows causes expensive disk I/O during data loading.

Blame it on PostgreSQL Multi-Version Concurrency Control (MVCC) for bloating.  Because, Multi-Version Concurrency Control (MVCC) in PostgreSQL is committed to maintain each transaction isolated and durable ( ACID compliance in transaction management), readers never block writers and vice versa. Every transaction ( such as an insert, update or delete, as well as explicitely wrapping a group of statements together via BEGIN – COMMIT.) in PostgreSQL is identified by a transaction ID called XID. When a transaction starts, Postgres increments an XID and assigns it to the current transaction. PostgresSQL also stores transaction information on every row in the system, which is used to determine whether a row is visible to the transaction or not. Because different transactions will have visibility to a different set of rows, PostgreSQL needs to maintain potentially obsolete records. This is why an UPDATE actually creates a new row and why DELETE doesn’t really remove the row: it merely marks it as deleted and sets the XID values appropriately. As transactions complete, there will be rows in the database that cannot possibly be visible to any future transactions. These are called dead rows ( technically bloated records in PostgreSQL).

How PostgreSQL database handles bloating for optimal performance and storage efficiency ?

In the past PostgreSQL DBAs ( pre PostgreSQL 8 ) used to reclaim the storage from dead tuples by running VACUUM command manually, This was most daunting task to do because DBAs need to to balance the resource utilization for vacuuming against the current transaction volume / load to plan when to do it, and also when to abort it.  PostgreSQL “autovacuum” feature simplified DBA life much better on managing database bloating and vacuum.

How Autovacuum works in PostgreSQL ? 

Autovacuum is one of the background utility processes that starts automatically when the actual number of dead tuples in a table exceeds an effective threshold, due to updates and deletes, The frequency of this process is controlled by PostgreSQL configuration parameter autovacuum_naptime (default is 1 minute) and autovacuum attempts to start a new worker process every time when vacuuming process begins, this completely depends on the value of configuration parameter autovacuum_max_workers (default 3). The worker searches for tables where PostgreSQL’s statistics records indicate a large enough number of rows have changed relative to the table size. The formula is:

## The formula which is applied by autovacuum process to identify tables which are bloated and need immediate attention for vacuuming: 

[estimated rows invalidated] ≥ autovacuum_vacuum_scale_factor * [estimated table size] + autovacuum_vacuum_threshold

This is what happens internally with-in PostgreSQL during autovacuum process:

The worker threads generated will start removing dead tuples and compacting pages aggressively but these entire activities consumes intense disk I/O throughput, Autovacuum records these I/O credits and when it exceeds autovacuum_vacuum_cost_limit then autovacuum pauses all workers for few milliseconds depending on the value of configuration parameter autovacuum_vacuum_cost_delay (default is 20 milliseconds). The vacuuming as mentioned above is an resource hogging and time consuming activity because every vacuum worker scan through individual dead rows to remove index entries pointed on those rows before compaction of pages, If you have deployed PostgreSQL on a limited memory / RAM infrastructure then maintenance_work_mem configuration parameter will be very conservative and this leaves worker thread to process only limited dead rows during each attempt making vacuum fall behind.

How to configure Autovacuum parameters ?

The default autovacuum works great for few GBs PostgreSQL Database and is definitely not recommended for larger PostgreSQL infrastructure as with increasing data / transactions volume the vacuum will fall behind. Once vacuum has fallen behind, It will directly impact query execution plan and performance, This will direct DBAs to either run autovacuum less frequently or not at all. The following matrix recommends optimal sizing of configuration parameters for larger PostgreSQL database instances:

PostgreSQL Autovacuum configuration parameterHow to tune PostgreSQL Autovacuum configuration parameter for Performance and Reliability
autovacuum (boolean)This configuration parameter decides whether your PostgreSQL server should run the autovacuum launcher daemon process. Technically you can never disable autovacuum because even when this parameter is disabled, the system will launch autovacuum processes if necessary to prevent transaction ID wraparound.

P.S.- You have to enable track_counts for autovacuum to work.
log_autovacuum_min_duration (integer)To track autovacuum activity you have to enable this parameter
autovacuum_max_workers (integer)The parameter specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that may be running at any one time. The default is three and we recommend 6 to 8 for PostgreSQL performance
autovacuum_naptime (integer)This parameter specifies the minimum delay between autovacuum runs on any given database. The delay is measured in seconds, and the default is one minute (1min), We recommend to leave this parameter untouched even when you have very large PostgreSQL tables with DELETEs and UPDATEs.
autovacuum_vacuum_threshold (integer)This parameter specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table. The default is 50 tuples. This parameter could be larger when you have smaller tables.
autovacuum_analyze_threshold (integer)Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples. This parameter could be larger when you have smaller tables.
autovacuum_vacuum_scale_factor (floating point)This parameter specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM. The default is 0.2 (20% of table size). If you have larger PostgreSQL tables we recommend smaller values (0.01)
autovacuum_analyze_scale_factor (floating point)This parameter specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.1 (10% of table size). If you have larger PostgreSQL tables we recommend smaller values (0.01).
autovacuum_freeze_max_age (integer)This parameter specifies the maximum age (in transactions) that a table's pg_class.relfrozenxid field can attain before a VACUUM operation is forced to prevent transaction ID wraparound within the table.

P.S. - PostgreSQL will launch autovacuum processes to prevent wraparound even when autovacuum is otherwise disabled.
autovacuum_multixact_freeze_max_age (integer)This parameter specifies the maximum age (in multixacts) that a table's pg_class.relminmxid field can attain before a VACUUM operation is forced to prevent multixact ID wraparound within the table.

P.S. - PostgreSQL will launch autovacuum processes to prevent wraparound even when autovacuum is otherwise disabled.
autovacuum_vacuum_cost_delay (integer)This parameter specifies the cost delay value that will be used in automatic VACUUM operations. If -1 is specified, the regular vacuum_cost_delay value will be used. The default value is 20 milliseconds.

P.S. - The default value works even when you have very large PostgreSQL database infrastructure.
autovacuum_vacuum_cost_limit (integer)Specifies the cost limit value that will be used in automatic VACUUM operations. If -1 is specified (which is the default), the regular vacuum_cost_limit value will be used. The default value is 20 milliseconds.

P.S. - The default value works even when you have very large PostgreSQL database infrastructure.

Interesting links for extra reading 

☛ MinervaDB contacts – Sales & General Inquiries

Business FunctionContact
☎ CONTACT GLOBAL SALES (24*7)📞 (844) 588-7287 (USA)
📞 (415) 212-6625 (USA)
📞 (778) 770-5251 (Canada)
☎ TOLL FREE PHONE (24*7)📞 (844) 588-7287
🚩 MINERVADB FAX+1 (209) 314-2364
📨 MinervaDB Email - General / Sales / Consultingcontact@minervadb.com
📨 MinervaDB Email - Support support@minervadb.com
📨 MinervaDB Email -Remote DBAremotedba@minervadb.com
📨 Shiv Iyer Email - Founder and Principal shiv@minervadb.com
🏠 CORPORATE ADDRESS: CALIFORNIAMinervaDB Inc.,
340 S LEMON AVE #9718
WALNUT 91789 CA, US
🏠 CORPORATE ADDRESS: DELAWAREMinervaDB Inc.,
PO Box 2093 PHILADELPHIA PIKE #3339
CLAYMONT, DE 19703
🏠 CORPORATE ADDRESS: HOUSTON MinervaDB Inc., 1321 Upland Dr. PMB 19322, Houston,
TX 77043, US

The post How PostgreSQL Autovacuum works ? appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
https://minervadb.com/index.php/2020/04/27/how-postgresql-autovacuum-works/feed/ 1
How PostgreSQL 11 made adding new table columns with default values faster ? https://minervadb.com/index.php/2020/02/22/how-postgresql-11-made-adding-new-table-columns-with-default-values-faster/ Sat, 22 Feb 2020 07:23:38 +0000 http://minervadb.com/?p=3211 Adding new table columns with default values faster in PostgreSQL 11 Before PostgreSQL  11 adding anew table column with a non-null default value results in a rewrite of the entire table, This works fine for [...]

The post How PostgreSQL 11 made adding new table columns with default values faster ? appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
Adding new table columns with default values faster in PostgreSQL 11

Before PostgreSQL  11 adding anew table column with a non-null default value results in a rewrite of the entire table, This works fine for a smaller data sets but the whole thing goes super complicated and expensive with high volume databases because of ACCESS EXCLUSIVE LOCK  ( default lock mode for LOCK TABLE statements that do not specify a mode explicitly ) on the table which conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE) and guarantees the holder of the lock is the only transaction accessing the table in any way, and it’ll block every other operation until it’s released; even simple SELECT statements have to wait, This is unacceptable for a continuously 24*7 accessed table and is a serious performance bottleneck. PostgreSQL 11 has addressed this problem gracefully by storing default value in the catalog and ushered whenever needed in rows exiting at the  time the change was made and for new rows / new versions of existing rows are written with default value in place. The rows which existed before this change was made  with NULL values uses the value stored in the catalog when the row is fetched. This makes adding new table columns with default values faster and even smarter. To conclude, The default value doesn’t have to be  a static expression, It can be even non-volatile expressions like CURRENT_TIMESTAMP but volatile expressions such as random(), currval(), timeofday() will still result in table rewrites.

If you want  MinervaDB PostgreSQL consultants to help you in PostgreSQL Performance Optimization and Tuning, Please book for an no obligation PostgreSQL consulting below:

Book your Appointment with a MinervaDB Principal

The post How PostgreSQL 11 made adding new table columns with default values faster ? appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
Top Three Most Compelling New Features From PostgreSQL 12 https://minervadb.com/index.php/2020/02/19/postgresql-12-new-features/ Wed, 19 Feb 2020 20:12:59 +0000 http://minervadb.com/?p=3157 Why upgrade to PostgreSQL 12 ?  PostgreSQL 12 provides significant performance and maintenance enhancements to its indexing system and to partitioning. PostgreSQL 12 introduces the ability to run queries over JSON documents using JSON path expressions [...]

The post Top Three Most Compelling New Features From PostgreSQL 12 appeared first on The WebScale Database Infrastructure Operations Experts.

]]>
Why upgrade to PostgreSQL 12 ? 

PostgreSQL 12 provides significant performance and maintenance enhancements to its indexing system and to partitioning. PostgreSQL 12 introduces the ability to run queries over JSON documents using JSON path expressions defined in the SQL/JSON standard. Such queries may utilize the existing indexing mechanisms for documents stored in the JSONB format to efficiently retrieve data. PostgreSQL 12 extends its support of ICU collations by allowing users to define “nondeterministic collations” that can, for example, allow case-insensitive or accent-insensitive comparisons. PostgreSQL 12 enhancements include notable improvements to query performance, particularly over larger data sets, and overall space utilization. This release provides application developers with new capabilities such as SQL/JSON path expression support, optimizations for how common table expression (WITH) queries are executed, and generated columns

The following are top three most interesting features introduced in PostgreSQL 12 : 

1. Much better indexing for performance and optimal space management in PostgreSQL 12 –  Why we worry so much about indexing in Database Systems ? All of us know very well that large amount data Can’t technically fit well in the main memory. When you have more number of keys, You will eventually end-up reading more from disk compared to main memory and Disk access time is very high compared to main memory access time.  We use B-tree indexes to reduce the number of disk accesses. B-tree is a data structure that store data in its node in sorted order. B-tree stores data in a way that each node accommodate keys in ascending order. B-tree uses an array of entries for a single node and having reference to child node for each of these entries. We spend significant amount of time to reclaim the storage occupied by dead tuples and this happen due to PostgreSQL indexes bloat, which take up extra storage in the disk. Thanks to PostgreSQL 12, We have now much better B-tree indexing which can reduce up to 40% in space utilization and overall gain in the query performance and that means we have now both faster WRITEs and READs. PostgreSQL 12 introduces the ability to rebuild indexes without blocking writes to an index via the REINDEX CONCURRENTLY command, allowing users to avoid downtime scenarios for lengthy index rebuilds.

2. ALTER TABLE ATTACH PARTITION without blocking queries – In PostgreSQL, Every lock has queue. If transaction T2 tries to acquire a lock that is already held by transaction T1  with a conflicting lock level, then transaction T1 will wait in the lock queue. Now something interesting happens: if another transaction T3 comes in, then it will not only have to check for conflict with T1, but also with transaction T2, and any other transaction in the lock queue. So even if your DDL command can run very quickly, it might be in a queue for a long time waiting for queries to finish, and queries that start after it will be blocked behind it. PostgreSQL support partitioning, The partitioning is about splitting logically one large table into several pieces. Partitioning improves query performance.  The PostgreSQL partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. Till PostgreSQL 11, During INSERTs into a partitioned table, the every partition of respective table was locked and it doesn’t even matter if it received a new record or not, At a large data operations scale with larger number of partitions this could become a serious bottleneck. Starting from PostgreSQL 12, When we are inserting a row, only the related partition will be locked. This results in much better performance at higher partition counts, especially when inserting just 1 row at a time.

3. JSON Path support in Postgres 12 – The JSON data-type was introduced in PostgreSQL-9.2 and from there PostgreSQL commitment to JSON data management has increased significantly.  The SQL:2016 standard introduced JSON and various ways to query JSON values, The major addition came-up in PostgreSQL-9.4 with the addition of JSONB data-type. JSONB is an advanced version of JSON data-type which stores the JSON data in binary format. PostgreSQL 12 support JSON Path, The JSON Path in PostgreSQL is implemented as jsonpath data type, which is actually the binary representation of parsed SQL/JSON path expression. The main task of the path language is to specify the parts (the projection) of JSON data to be retrieved by path engine for the SQL/JSON query functions. PostgreSQL 12 introduces the ability to run queries over JSON documents using JSON path expressions defined in the SQL/JSON standard. Such queries may utilize the existing indexing mechanisms for documents stored in the JSONB format to efficiently retrieve data.

References 

  1. Gentle Guide to JSONPATH in PostgreSQL – https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md
  2. PostgreSQL release notes – https://www.postgresql.org/about/news/1976/

Book for an no obligation consulting with MinervaDB PostgreSQL Team 

 

The post Top Three Most Compelling New Features From PostgreSQL 12 appeared first on The WebScale Database Infrastructure Operations Experts.

]]>