in Databases

Locking when altering a table in PostgreSQL vs. MySQL

MySQL

An alter table command in MySQL is very expensive. It creates a new version of the table based on what the table should look like with the alter DDL, proceeds to copy all of the records over to the new table, and then will swap the new table with the original table and delete the old table. There is a global write lock on the database for the duration of this alter table. If the table is small, this usually happens fast enough that you wouldn’t notice. As the table grows to larger sizes you need to consider how long the table will be locked, and also calculate out how it will affect your replication.

For this reason, you typically want to get all of your alter commands bunched together and execute it at the same time to prevent multiple series of long locks from occurring. Many methods have been developed to combat this locking issue in MySQL.

Master Master Setup

If you have a master master setup, you can turn binary logging off for your session and update the passive node, and then failover and update the original master (this requires backward safe changes that don’t do things like set default values).

Online Alter

You could perform an online alter, which essentially means you do the following:

  1. Create the new table the way you want it

  2. Create triggers on the original table for insert, update, and delete that will modify the data appropriately in the new table in addition to the original

  3. Copy rows in batches to the new table until everything is in sync

At this point you can rename the tables in a shell game fashion inside of a transaction and then delete the original table and triggers.

pt-online-schema-change

You could use Percona’s pt-online-schema-change which essentially does the online alter for you. PostgreSQL handles alter statements differently for adding and deleting columns.

PostgreSQL

PostgreSQL only allows you to add columns to the end of the table. There is no way to add one to the beginning or middle of a field set after the table is originally created. When you run an alter table statement, PostgreSQL will update the table structure and there is no locking unless you do something that needs to alter existing rows. For example, you could do the following:

ALTER TABLE users ADD COLUMN hat_size TEXT DEFAULT 'L';

This obviously is setting a default for every row. The schema will be changed, and every row will be touched while the default data is written. Locking will occur until this is complete, similar to what would happen in MySQL. Alternatively, you could do this:

BEGIN;
ALTER TABLE users ADD COLUMN hat_size TEXT;
ALTER TABLE users ALTER COLUMN hat_size SET DEFAULT 'L';
COMMIT;

The difference for this is that you are allowing nulls initially, and you are setting all future records to the default value. You could follow up with an update in batches if you wanted to change values that are null to the default value and then minimal row level locking would occur. New in PostgreSQL 9.4, you should be able to do the following actions without locking as well:

  • SET STATISTICS
  • DROP TRIGGER/CONSTRAINT
  • DROP a foreign key from a referencing table

PostgreSQL supports building indexes without locking out writes as well. This method is invoked by specifying the CONCURRENTLY option of CREATE INDEX. When this option is used, PostgreSQL must perform two scans of the table, and in addition it must wait for all existing transactions that could potentially use the index to terminate. Thus this method requires more total work than a standard index build and takes significantly longer to complete. However, since it allows normal operations to continue while the index is built, this method is useful for adding new indexes in a production environment. Of course, the extra CPU and I/O load imposed by the index creation might slow other operations.

With the exception of the first approach I mentioned (where you upgrade idle or slave nodes first without writing to the binlog), the online alter method utilized in MySQL could work for PostgreSQL as well. Please note, I am utilizing Streaming Replication in PostgreSQL and other replication types like slony or bucardo may behave differently or have different limitations. PostgreSQL versions are now introducing Bi-Directional Replication (BDR) and may alleviate some of the issues with updating the idle replication nodes.  In the future, I’ll post a blog where I experiment with that. You can read more about this in detail on the blogs that educated me.

Links:

Write a Comment

Comment