Logical decoding text of two-phase commit in PostgreSQL 14

Update: July 2, 2023

Logical decoding text of two-phase commit in PostgreSQL 14

The Fujitsu OSS team and the PostgreSQL open source community have cooperated to add the function of decrypting the two-phase commit in logical replication in PG14. Let’s take a look at what this feature is?

background

Two-phase commit is a mechanism in which transactions are committed in two phases. Usually used in distributed databases to ensure consistency. The two phases of the transaction are the PREPARE phase and the COMMIT/ROLLBACK phase. The commands submitted in two phases in PG are:

PREPARE TRANSACTION

COMMIT PREPARED

ROLLBACK PREPARED

PG already supports two-phase commit in version 8.0, and version 10.0 supports logical replication. However, two-phase commit has never been supported in logical replication. The PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED commands have been supported in a single instance, but when these commands need to be logically copied to the standby machine, they no longer maintain the original meaning. The PREPARE TRANSACTION command is treated as a NOP and is not decoded at all. The COMMIT PREPARED command is regarded as COMMIT, and the ROLLBACK PREPARED command is regarded as ABORT.

What is a two-phase commit

Two-phase commit is an atomic commit protocol that helps maintain consistency between distributed databases. Ordinary commits that provide atomicity within the database are not sufficient to provide consistency for cross-database transactions. To illustrate this problem, let’s give an example:

1) John has 300$ in Bank A

2) Mark has 100$ in Bank B

3) John wants to transfer 100$ to Mark

During the transaction, you need to withdraw 100$ from Bank A to Bank B. At the end of the transaction, there should be 200$. If any transaction fails at any time during the transfer process, the account status should be restored to the state before the transfer started. The transaction may fail for various reasons. If any interruption occurs before the transaction commits, the transaction will be rolled back. In our example, if an interruption occurs when John’s account is deducted, then John’s account of the interruption port should not be reduced. This is how a simple commit maintains consistency within the database.

But we consider such a situation, that is, the transaction that deducts 100$ from John’s account is successful in one submission, but the transaction that adds 100$ to Mark’s account in Bank B fails and is rolled back. After this operation is over, although John’s account has been debited, Mark will not receive the amount. 100$ disappeared. When dealing with distributed transactions, a simple commit may fail.

Step-by-step execution of distributed transactions

For two-phase commit, one of the databases acts as the coordinator of distributed transactions.

Stage 1

A database begins to apply transactions, and then prepare. It sends prepared transactions to other databases in the form of prepare messages. The second database gets the Prepare message, and then prepares the transaction. Prepare involves changes in the transaction, but does not commit. The dirty data is written to disk for persistence. Once all databases have prepared the transaction, and all information about the transaction is stored on disk, the prepare phase is complete.

Stage 2

Next, the arbiter starts the commit phase. If the second database fails to prepare the transaction for some reason, the arbiter starts the rollback phase. Therefore, depending on whether the prepare is successful, the transaction is either committed or rolled back. Interruption in the final commit phase can be recovered, because the required prepare transaction has been written to disk and can be reapplied.

Two-phase commit is not related to single-instance databases, but it is relevant when data is replicated across multiple database instances.

It is very important to support two-phase commit in logical replication.

Functional Overview

Before the PG14 version, logical replication transactions were decoded and replicated only after the transaction was committed. This is to avoid that the replication transaction may eventually be aborted.

Decode transaction on commit

The logical replication of PG14 supports PREPARE TRANSACTION, COMMIT PREPARED and ROOLBACK PREPARED commands. When the PREPARE TRANSACTION command is decoded, the transaction is decoded and copied. PREPARE TRANSACTION starts transaction replay and decoding just like COMMIT in WAL SENDER.

Decoding transaction during prepare

We also defined a new plug-in callback to allow logical decoding plug-ins to support two-phase submission.

Callback

describe

filter_prepare_cb

Allow the plugin to filter transactions that do not need to be decoded when preparing according to the GID used in the PREPARE TRANSACTION command

begin_prepare_cb

Start of Prepare transaction

prepare_cb

Called when the PREPARE TRANSACTION command is decoded

commit_prepared_cb

Called when the COMMIT PREPARED command is decoded

rollback_prepared_cb

Called when the ROLLBACK PREPARED command is decoded

Plug-in modification

test_decoding

The plug-in is a logical decoding output plug-in, as an example to help users develop their own logical decoding plug-in. test_decoding receives WAL through a logical decoding mechanism and decodes it into a textual representation of the operation performed.

It has been modified to be able to use the new two-stage callback function and decode transaction during prepare

APIs modification

pg_create_logical_replication_slot()

The API adds a new option to specify whether the slot supports two-phase commit. The output plugin can use a replication slot with a two-phase option to support two-phase commit.

pg_create_logical_replication_slot(slot_name name, plugin name [, temporary boolean, two_phase boolean ] )

Case study

Let’s see how to detect the decoded output of the two-phase commit transaction:

1) Create a replication slot

Use test_decoding as the output plug-in, and pass in true so that the slot supports two-stage submission and decoding.

postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);

slot_name    |   lsn

-----------------+-----------

regression_slot | 0/16B1970

(1 row)

2) Create a table

postgres=# CREATE TABLE data(id serial primary key, data text);

CREATE TABLE

3) Detect the decoded output content of prepare transaction and commit transaction

postgres=# BEGIN;

postgres=*# INSERT INTO data(data) VALUES('5');

postgres=*# PREPARE TRANSACTION 'test_prepared1';

postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);

lsn    | xid |          data

-----------+-----+-----------------

0/1689DC0 | 529 | BEGIN 529

0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'

0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529

(3 rows)

postgres=# COMMIT PREPARED 'test_prepared1';

postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);

lsn    | xid |       data

-----------+-----+------------------

0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529

(4 rows)

postgres=# select * from data;

id | data

----+------

1 |  5

(1 row)

future

PG14’s change to this function has a decoder-side infrastructure that allows two-stage submission of decoding during prepare. We also modified the test_decoding plugin to take advantage of this infrastructure.

The next step is to implement the two-stage support to the largest logical decoding plug-in in PG-the pgoutput plug-in. This plug-in supports PUBLISHER/SUBSCRIBER mode of logical replication. It is the most widely used plug-in in logical replication. The Fujitsu OSS team is working with the open source community to add this feature to PG15.

For the two-phase transaction in the distributed database, PG also needs to support: the standby machine informs the host that PREPARE has failed and initiates a rollback. This feedback mechanism is not supported in PG and is one of the directions for future improvement.

  Links:

6mbi25l-120

rm500uz-2h