In the previous article https://www.up-2date.com/post/transactional-messaging-polling-publisher, we have seen how to implement the Polling Publisher pattern to solve the dual write problem. In this one, I will show you a different solution to this problem called Transaction log tailing.
Transaction Log Tailing
Every committed update made by an application is represented as an entry in the database’s transaction log. The Transaction Log Miner reads the transaction log entries. It converts each relevant log entry corresponding to an inserted message into a message and publishes that message to the message broker. This approach can be used to publish messages written to an OUTBOX table in an RDBMS or messages appended to records in a NoSQL database.
Implementing a transaction log miner requires some effort since you have to deal with low-level code that calls database-specific API. Alternatively, you could use an already existing open-source framework.
There are a few examples of such frameworks:
Not all of them can be interchangeably used as they support different databases as well as cover different use-cases.
In this article, we will be using Debezium, the most popular framework, which is considered a de facto standard for CDC(Change data capture).
Debezium is a distributed platform that turns your existing databases into event streams, so applications can see and respond immediately to each row-level change in the databases.
There are three ways to deploy Debezium:
Debezium using Apache Kafka Connect
Debezium using Apache Kafka Connect
Most commonly, you deploy Debezium using Apache Kafka Connect. Kafka Connect is a framework and runtime for implementing and operating:
Source connectors such as Debezium that send records into Kafka
Sink connectors that propagate records from Kafka topics to other systems
The following image shows the architecture of a change data capture pipeline based on Debezium:
As shown in the image, the Debezium connectors for MySQL and PostgresSQL are deployed to capture changes to these two types of databases. Each Debezium connector establishes a connection to its source database:
The MySQL connector uses a client library for accessing the binlog.
The PostgreSQL connector reads from a logical replication stream.
Kafka Connect operates as a separate service besides the Kafka broker.
After change event records are in Apache Kafka, different connectors in the Kafka Connect eco-system can stream the records to other systems and databases such as Elasticsearch, data warehouses and analytics systems, or caches such as Infinispan.
Another way to deploy Debezium is using the Debezium server. The Debezium server is a configurable, ready-to-use application that streams change events from a source database to a variety of messaging infrastructures. The following image shows the architecture of a change data capture pipeline that uses the Debezium server:
The Debezium server is configured to use one of the Debezium source connectors to capture changes from the source database. Change events can be serialized to different formats like JSON or Apache Avro and then will be sent to one of a variety of messaging infrastructures such as Amazon Kinesis, Google Cloud Pub/Sub, or Apache Pulsar.
Yet an alternative way for using the Debezium connectors is the embedded engine. In this case, Debezium will not be run via Kafka Connect, but as a library embedded into your custom Java applications. This can be useful for either consuming change events within your application itself, without the need for deploying complete Kafka and Kafka Connect clusters, or for streaming changes to alternative messaging brokers such as Amazon Kinesis.
We will be using the embedded debezium library in our delivery application which you may remember from the previous article: https://github.com/giova333/transactional-messaging. First of all, we need to add the following dependencies:
Next, we need to provide the debezium configuration:
From the previous article you may remember OrderCreationService:
Finally, we need to implement EventPublisher based on embedded debezium:
DebeziumEventPublisher.publish persists events in the outbox table
@DebeziumEventListener listens for changes in the transaction log and publishes messages to Kafka topic
In this article, you learned how we can use the Transaction log tailing pattern with Debezium to solve the dual write problem.