Data accurate consistency technology practice based on Apache SeaTunnel
Introduction to Two-Phase Commit Protocol in Distributed Systems: A Deep Dive into Apache SeaTunnel’s Implementation
In distributed systems, ensuring data consistency is one of the crucial tasks. Data consistency refers to maintaining the accuracy and completeness of data across different nodes in a distributed system during updates. However, due to network latency, node failures, and other factors, data consistency in distributed systems becomes complex and challenging. To address this, the Two-Phase Commit (2PC) protocol is widely used to ensure data consistency in distributed systems. This article explains the workings of the Two-Phase Commit protocol and discusses its key strategies in distributed systems. It also explores how Apache SeaTunnel implements the Two-Phase Commit protocol to ensure data consistency.
Distributed Consistency
In distributed scenarios, multiple services simultaneously serve a process. For example, in an e-commerce order placement scenario, payment services process payments, inventory services deduct stock, and logistics services update shipping information. If any service fails or a request is lost due to network issues, the entire system may experience data inconsistency.
This scenario illustrates the challenge of distributed data consistency, fundamentally caused by distributed operations that prevent local transactions from guaranteeing atomicity.
There are two main solutions to distributed consistency: distributed transactions and business process adaptations to avoid distributed transactions. Given the universality of distributed transaction solutions, this article focuses on implementing distributed transactions.
Types of Distributed Transactions
Distributed transactions are broadly classified into rigid and flexible transactions:
- Rigid Transactions: Maintain strong consistency, natively support rollback/isolation, low concurrency, suitable for short transactions (e.g., XA protocol (2PC, JTA, JTS), 3PC).
- Flexible Transactions: Require business transformation, eventually consistent, implement compensation interfaces and resource locking interfaces, high concurrency, suitable for long transactions (e.g., TCC, Saga (state machine mode, AOP mode), local transaction messaging, message transactions).
This article primarily focuses on XA transactions.
XA Two-Phase Commit Protocol
The XA protocol, commonly known as the Two-Phase Commit Protocol (2PC), involves a coordinator and participants. It’s a strongly consistent design, introducing a coordinator role to manage the commit and rollback of each participant. The two phases include preparation (voting) and commitment.
First Phase (Preparation)
- The Coordinator node sends a Prepare commit request to all Participant nodes and waits for their responses.
- Upon receiving the Prepare request, each Participant executes the transaction-related updates, saving the results in local logs. If successful, the participant temporarily withholds the transaction and responds with a “completed” message to the Coordinator.
- Once the Coordinator receives all responses, the distributed transaction enters the second phase.
A failure by any participant leads the Coordinator to request a rollback, indicating a failed distributed transaction.
Second Phase (Commitment)
- The Coordinator decides whether to commit the transaction based on the responses.
- If all responses are affirmative, the Coordinator issues Commit requests to all participants, awaiting their confirmation.
- Participants update the operation results in the database upon receiving the Commit request and send confirmation back to the Coordinator.
- The Coordinator finalizes the decision to commit or rollback the transaction, notifying all participants.
Two-Phase Commit in Apache SeaTunnel
In Apache SeaTunnel, Exactly-once consistency is mainly achieved through the following methods:
For database sinks, the common approach is the Two-Phase Commit, as illustrated below:
Key classes involved include:
Specifically, JdbcExactlyOnceSinkWriter, which implements the SinkWriter interface, plays a crucial role in XA transaction implementation.
Conclusion
Implementing exact consistency in Apache SeaTunnel is a key objective. SeaTunnel uses various practices to ensure exact data consistency. It supports the Two-Phase Commit, offering flexibility in defining and executing multi-stage operations. This adaptability makes SeaTunnel suitable for a wide range of applications and consistency needs.
Additionally, SeaTunnel features robust failure recovery and fault-tolerance mechanisms, establishing heartbeat checks for node availability. When node failure or network interruption occurs, SeaTunnel automatically detects and performs fault-tolerance and recovery operations, ensuring system stability and data consistency.
Lastly, SeaTunnel offers customizable policies and scalability. Users can set specific data consistency levels, timeout mechanisms, and conflict resolution strategies. Plus, SeaTunnel supports horizontal scaling to meet the demands of large-scale distributed systems.
In summary, SeaTunnel’s innovative practices in achieving exact data consistency, including Two-Phase Commit, failure recovery, and customizable strategies, provide high-performance, reliable data consistency assurances. These approaches offer innovative solutions to data consistency challenges in distributed systems.
References: