How PostgreSQL Autovacuum works ?

Understanding PostgreSQL Autovacuum for Performance and Reliability – Troubleshooting PostgreSQL Performance


What is VACUUM in PostgreSQL ?

In PostgreSQL, whenever rows in a table deleted, The existing row or tuple is marked as dead ( will not be physically removed) and during an update, it marks corresponding exiting tuple as dead and inserts a new tuple so in PostgreSQL UPDATE operations = DELETE + INSERT. These dead tuples consumes unnecessary storage and eventually you have a bloated PostgreSQL database, This is a serious issue to solve for a PostgreSQL DBA. VACUUM reclaims in the storage occupied by dead tuples, Please keep this noted that reclaimed storage space is never given back to the resident operating system rather they are just defragmented within the same database page and so storage for reuse to future data inserts in the same table. Does the pain stops here ? No, It doesn’t. Bloat seriously affect the PostgreSQL query performance, In PostgreSQL tables and indexes are stored as array of fixed-size pages ( usually 8KB in size). Whenever a query requests for rows, the PostgreSQL instance loads these pages into the memory and dead rows causes expensive disk I/O during data loading.

Blame it on PostgreSQL Multi-Version Concurrency Control (MVCC) for bloating.  Because, Multi-Version Concurrency Control (MVCC) in PostgreSQL is committed to maintain each transaction isolated and durable ( ACID compliance in transaction management), readers never block writers and vice versa. Every transaction ( such as an insert, update or delete, as well as explicitely wrapping a group of statements together via BEGIN – COMMIT.) in PostgreSQL is identified by a transaction ID called XID. When a transaction starts, Postgres increments an XID and assigns it to the current transaction. PostgresSQL also stores transaction information on every row in the system, which is used to determine whether a row is visible to the transaction or not. Because different transactions will have visibility to a different set of rows, PostgreSQL needs to maintain potentially obsolete records. This is why an UPDATE actually creates a new row and why DELETE doesn’t really remove the row: it merely marks it as deleted and sets the XID values appropriately. As transactions complete, there will be rows in the database that cannot possibly be visible to any future transactions. These are called dead rows ( technically bloated records in PostgreSQL).

How PostgreSQL database handles bloating for optimal performance and storage efficiency ?

In the past PostgreSQL DBAs ( pre PostgreSQL 8 ) used to reclaim the storage from dead tuples by running VACUUM command manually, This was most daunting task to do because DBAs need to to balance the resource utilization for vacuuming against the current transaction volume / load to plan when to do it, and also when to abort it.  PostgreSQL “autovacuum” feature simplified DBA life much better on managing database bloating and vacuum.

How Autovacuum works in PostgreSQL ? 

Autovacuum is one of the background utility processes that starts automatically when the actual number of dead tuples in a table exceeds an effective threshold, due to updates and deletes, The frequency of this process is controlled by PostgreSQL configuration parameter autovacuum_naptime (default is 1 minute) and autovacuum attempts to start a new worker process every time when vacuuming process begins, this completely depends on the value of configuration parameter autovacuum_max_workers (default 3). The worker searches for tables where PostgreSQL’s statistics records indicate a large enough number of rows have changed relative to the table size. The formula is:

## The formula which is applied by autovacuum process to identify tables which are bloated and need immediate attention for vacuuming: 

[estimated rows invalidated] ≥ autovacuum_vacuum_scale_factor * [estimated table size] + autovacuum_vacuum_threshold

This is what happens internally with-in PostgreSQL during autovacuum process:

The worker threads generated will start removing dead tuples and compacting pages aggressively but these entire activities consumes intense disk I/O throughput, Autovacuum records these I/O credits and when it exceeds autovacuum_vacuum_cost_limit then autovacuum pauses all workers for few milliseconds depending on the value of configuration parameter autovacuum_vacuum_cost_delay (default is 20 milliseconds). The vacuuming as mentioned above is an resource hogging and time consuming activity because every vacuum worker scan through individual dead rows to remove index entries pointed on those rows before compaction of pages, If you have deployed PostgreSQL on a limited memory / RAM infrastructure then maintenance_work_mem configuration parameter will be very conservative and this leaves worker thread to process only limited dead rows during each attempt making vacuum fall behind.

How to configure Autovacuum parameters ?

The default autovacuum works great for few GBs PostgreSQL Database and is definitely not recommended for larger PostgreSQL infrastructure as with increasing data / transactions volume the vacuum will fall behind. Once vacuum has fallen behind, It will directly impact query execution plan and performance, This will direct DBAs to either run autovacuum less frequently or not at all. The following matrix recommends optimal sizing of configuration parameters for larger PostgreSQL database instances:

PostgreSQL Autovacuum configuration parameterHow to tune PostgreSQL Autovacuum configuration parameter for Performance and Reliability
autovacuum (boolean)This configuration parameter decides whether your PostgreSQL server should run the autovacuum launcher daemon process. Technically you can never disable autovacuum because even when this parameter is disabled, the system will launch autovacuum processes if necessary to prevent transaction ID wraparound.

P.S.- You have to enable track_counts for autovacuum to work.
log_autovacuum_min_duration (integer)To track autovacuum activity you have to enable this parameter
autovacuum_max_workers (integer)The parameter specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that may be running at any one time. The default is three and we recommend 6 to 8 for PostgreSQL performance
autovacuum_naptime (integer)This parameter specifies the minimum delay between autovacuum runs on any given database. The delay is measured in seconds, and the default is one minute (1min), We recommend to leave this parameter untouched even when you have very large PostgreSQL tables with DELETEs and UPDATEs.
autovacuum_vacuum_threshold (integer)This parameter specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table. The default is 50 tuples. This parameter could be larger when you have smaller tables.
autovacuum_analyze_threshold (integer)Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples. This parameter could be larger when you have smaller tables.
autovacuum_vacuum_scale_factor (floating point)This parameter specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM. The default is 0.2 (20% of table size). If you have larger PostgreSQL tables we recommend smaller values (0.01)
autovacuum_analyze_scale_factor (floating point)This parameter specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.1 (10% of table size). If you have larger PostgreSQL tables we recommend smaller values (0.01).
autovacuum_freeze_max_age (integer)This parameter specifies the maximum age (in transactions) that a table's pg_class.relfrozenxid field can attain before a VACUUM operation is forced to prevent transaction ID wraparound within the table.

P.S. - PostgreSQL will launch autovacuum processes to prevent wraparound even when autovacuum is otherwise disabled.
autovacuum_multixact_freeze_max_age (integer)This parameter specifies the maximum age (in multixacts) that a table's pg_class.relminmxid field can attain before a VACUUM operation is forced to prevent multixact ID wraparound within the table.

P.S. - PostgreSQL will launch autovacuum processes to prevent wraparound even when autovacuum is otherwise disabled.
autovacuum_vacuum_cost_delay (integer)This parameter specifies the cost delay value that will be used in automatic VACUUM operations. If -1 is specified, the regular vacuum_cost_delay value will be used. The default value is 20 milliseconds.

P.S. - The default value works even when you have very large PostgreSQL database infrastructure.
autovacuum_vacuum_cost_limit (integer)Specifies the cost limit value that will be used in automatic VACUUM operations. If -1 is specified (which is the default), the regular vacuum_cost_limit value will be used. The default value is 20 milliseconds.

P.S. - The default value works even when you have very large PostgreSQL database infrastructure.

Interesting links for extra reading 

☛ MinervaDB contacts – Sales & General Inquiries

Business FunctionContact
☎ CONTACT GLOBAL SALES (24*7)📞 (844) 588-7287 (USA)
📞 (415) 212-6625 (USA)
📞 (778) 770-5251 (Canada)
☎ TOLL FREE PHONE (24*7)📞 (844) 588-7287
🚩 MINERVADB FAX+1 (209) 314-2364
📨 MinervaDB Email - General / Sales / Consultingcontact@minervadb.com
📨 MinervaDB Email - Support support@minervadb.com
📨 MinervaDB Email -Remote DBAremotedba@minervadb.com
📨 Shiv Iyer Email - Founder and Principal shiv@minervadb.com
🏠 CORPORATE ADDRESS: CALIFORNIAMinervaDB Inc.,
340 S LEMON AVE #9718
WALNUT 91789 CA, US
🏠 CORPORATE ADDRESS: DELAWAREMinervaDB Inc.,
PO Box 2093 PHILADELPHIA PIKE #3339
CLAYMONT, DE 19703
🏠 CORPORATE ADDRESS: HOUSTON MinervaDB Inc., 1321 Upland Dr. PMB 19322, Houston,
TX 77043, US

About MinervaDB Corporation 88 Articles
Independent and vendor neutral consulting, support, remote DBA services and training for MySQL, MariaDB, Percona Server, PostgreSQL and ClickHouse with core expertize in performance, scalability and high availability . We are an virtual corporation, all of us work from home on multiple timezones and stay connected via Email, Skype, Google Hangouts, Phone and IRC supporting over 250 customers worldwide

1 Trackbacks & Pingbacks

  1. /home1/minerho3/public_html/wp-includes/comment-template.php on line 677
    " class=""> Enterprise-class Consulting, 24*7 Support and Remote DBA Services for MySQL, MariaDB, PostgreSQL and ClickHouse

Comments are closed.

UA-155183614-1