【Popular Science Article】 Understand SeaTunnel CDC data synchronization in 3 minutes

5 min readApr 15, 2024

CDC (Change Data Capture) is a technology used to track row-level changes in database events (insertions, updates, deletions) and notify other systems of these events in the order they occurred. In disaster recovery scenarios, CDC primarily facilitates data synchronization between primary and backup systems, enabling real-time data synchronization from the primary database to the backup database.

source ----------> CDC ----------> sink

Apache SeaTunnel CDC

Data synchronization in SeaTunnel CDC is divided into two types:

Snapshot Read: Reading historical data from a table.
Incremental Tracking: Reading incremental log changes from a table.

2.1 Lock-Free Snapshot Synchronization

The reason why we emphasize “lock-free” in the lock-free snapshot synchronization phase is that existing CDC platforms might lock tables during synchronization of historical data, such as Debezium. The snapshot read phase involves synchronizing historical database data. The basic overview of the process is as follows:

storage------------->splitEnumerator----------split---------->reader
                            ^                                   |
                            |                                   |
                            \-----------------report------------/

Split division: The splitEnumerator (split distributor) divides the table data into multiple splits based on specified fields (e.g., table id or unique key) and step size.
Parallel processing: Each split is assigned to different readers through a routing algorithm for parallel reading, with each reader occupying one connection.
Event feedback: After completing the reading of a split, each reader reports progress back to the splitEnumerator.
The splitEnumerator sends a split to the reader, with the metadata of the split as follows:

String              splitId         Routing id
TableId             tableId         Table id
SeatunnelRowType    splitKeyType    Type of field the split is based on
Object              splitStart      Start point of the split read
Object              splitEnd        End point of the split read

The reader, upon receiving the split information, generates the relevant SQL statements. Before this, it records the starting position of the database log log corresponding to the current split and reports back to the splitEnumerator after processing the current split. The report content is as follows:

String      splitId         Split id
Offset      highWatermark   Position of the split's corresponding log for subsequent verification

2.2 Incremental Synchronization

The incremental synchronization phase is based on the snapshot read phase. When changes occur in the source database, the changed data is synchronized to the backup database in real-time. This phase listens to the database’s log, such as MySQL’s bin log. Incremental tracking is typically single-threaded to avoid duplicating bin log pulls and reduce database load. Therefore, only one reader works during this phase, using only one connection.

data log------------->splitEnumerator----------split---------->reader
                            ^                                   |
                            |                                   |
                            \-----------------report------------/

Incremental synchronization combines all splits and tables from the snapshot phase, so there is only one split during the incremental phase. The split information for this phase is as follows:

String                              splitId
Offset                              startingOffset                  Smallest log start among all splits
Offset                              endingOffset                    Log end position, continuous if no end, like in incremental phase
List<TableId>                       tableIds
Map<TableId, Offset>                tableWatermarks                 Watermarks of all splits
List<CompletedSnapshotSplitInfo>    completedSnapshotSplitInfos     Details of splits read during the snapshot phase

The fields in CompletedSnapshotSplitInfo are as follows:

String              splitId
TableId             tableId
SeatunnelRowType    splitKeyType
Object              splitStart
Object              splitEnd
Offset              watermark       Corresponding to the highWatermark in the report

The split in the incremental phase includes all watermarks from the snapshot phase, selecting an appropriate position for incremental synchronization, which is the smallest watermark.

Exactly-once

Whether during the snapshot read or the incremental read, the database may also be undergoing changes. How do we ensure exactly-once?

3.1 Snapshot Read Phase

In the snapshot read phase, for example, if during the synchronization of a split, the data in that split changes, such as the operations shown below (inserting k3, updating k2, deleting k1), if no identifiers are used during the reading process, then the updates to this part of the data would be lost. SeaTunnel’s approach is:

Check the bin log position in the database before reading the split: low watermark.
Read split {start, end} data.
Record the high watermark afterward.
If high = low, it indicates that the data within that split did not change during the reading; if (high — low) > 0, it indicates that the data changed during processing. The following steps are taken: ① Establish an in-memory table cache with the split data read; ② Change from low watermark to high watermark; ③ Replay operations in order of primary key to the in-memory table.
Report high watermark.

insert k3      update k2      delete k1
                |               |               |
                v               v               v
 bin log --|---------------------------------------------------|-- log offset
      low watermark                                     high watermark

CDC read data: k1 k3 k4
                    | replay
                    v
Real data:      k2 k3' k4

Incremental Phase

Before starting the incremental phase, all splits from the previous step are verified, as data updates might occur between splits, such as inserting several records between split1 and split2 during the snapshot phase, which would be missed. Seatunnel’s approach for retrieving data between splits is:

Find the smallest watermark from all split reports as the start watermark, and start reading the log.
For each log entry read, check if it has been processed in any split from the completedSnapshotSplitInfos. If not, it indicates that it is data from between splits and should be corrected.
After filtering the table, the entries from completedSnapshotSplitInfos can be deleted, and the remaining tables continued.
Once all splits have been verified, the full incremental phase begins.

|------------filter split2-----------------|
          |----filter split1------|                  
data log -|-----------------------|------------------|----------------------------------|- log offset
        min watermark      split1 watermark    split2 watermark                    max watermark

Checkpoint Resume

How to achieve pause and resume? Distributed snapshot algorithm (Chandy-Lamport):

Suppose the system contains two processes, p1 and p2. Process p1’s state includes three variables X1, Y1, Z1, and p2 contains three variables X2, Y2, Z2, with the initial state as follows:

p1                                  p2
X1:0                                X2:4
Y1:0                                Y2:2
Z1:0                                Z2:3

At this point, p1 initiates a global snapshot recording, first recording its own process state, then sending marker information to p2. Before the marker information reaches p2, p2 sends message M to p1.

p1                                  p2
X1:0     -------marker------->      X2:4
Y1:0     <---------M----------      Y2:2
Z1:0                                Z2:3

After p2 receives the marker from p1, it records its own state, and then p1 receives the message M sent by p2 before. Since p1 has already taken a local snapshot, p1 only needs to record M. Thus, the final snapshot is as follows:

p1 M                                p2
X1:0                                X2:4
Y1:0                                Y2:2
Z1:0                                Z2:3

During the SeaTunnel CDC process, markers sent to all nodes like readers, splitEnumerators, and writers will save their own memory state.