Published on

Postgres Vacuuming Internals

Authors

When you perform UPDATE/DELETE operations, postgres doesn't discard old rows immediately. It keeps older version of data i.e. dead tuples. There are various reasons for keeping old data

  • Transaction Isolation
  • Multi version concurrency control
  • Cleaning right away leads to Increased resource usages
  • Rollback Support

Over the time, these dead tuples accumulates and uses disk space. After some time very old dead tuples are not useful at all so it requires cleanup. That's where vacuuming is triggered to clean up dead tuples.

Vacuuming is a process of cleaning unused rows or stale data in databases. Vacuuming is similar to garbage collection in Java. We will talk mostly in context of Postgres Vacuuming but fundamental idea remains same.

Vacuuming Triggers

PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:

  • To recover or reuse disk space occupied by updated or deleted rows.
  • To update data statistics used by the PostgreSQL query planner.
  • To update the visibility map, which speeds up index-only scans.
  • To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound. First is simple and easy to grasp as you want to claim disk storage back occuiped by deleted rows. Postgres Query Planner uses data statistics to find best plan to execute a given query. Now imagine if these statistics are taking into account deleted rows then it slows down query planner's job and may pick suboptimal query plan.

Vacuuming Types

There are two types of Vacuuming

  • Standard Autovacuum - It is run by autovacuum daemon periodically.
  • Vacuum Full - It is a manual operation but it vacuum everything.

Performance Considerations

Vacuuming requires locks in order to free up some space on relations. Depending upon type of vacuuming being performed, higher level locking is required.

Monitoring

Tuning

Interesting Notes

  • When Standard AutoVacuum is performed, space claimed by dead tuples gets free but it is not available to other processes. It will only free up space for other processes when we perform Full Vacuum.
  • If your database has Update/Delete heavy workloads, keep in mind VACUUMING.

Recommendations

  • It is not ideal to perform Automatic Full Vacuum Scan during business hours.
  • It is not ideal to turn off Standard Vacuum.
  • Use vacuuming defaults in most cases.