Release of Version 2.3.6! Apache SeaTunnel Zeta Engine Introduces a New Architecture!

6 min readAug 6, 2024

The Apache SeaTunnel 2.3.6 version has officially been released recently. This version comprehensively updates the highly anticipated SeaTunnel Zeta Master/Worker new architecture, introduces an event notification mechanism, supports dynamically compiled transforms, and adds other new features and capabilities. Additionally, it introduces support for the first vector database, Milvus. This version also includes foundational bug fixes and documentation improvements. Feel free to give it a try!

📥 Download Version 2.3.6

📖 Release Notes

Key Updates

1. SeaTunnel Zeta Master/Worker New Architecture

First, the implementation of the SeaTunnel Zeta Master/Worker new architecture. In the current SeaTunnel architecture, there is no distinction between Master and Worker roles. All nodes act as both Master and Worker. SeaTunnel selects one node from these Master nodes as the active Master node, while other Master nodes serve as standby nodes. The distributed memory grid of the SeaTunnel cluster allows data to be imported into a HashMap that is distributed across all nodes in the cluster, with replicas. Tools like Flink store the task status information in third-party systems like Zookeeper.

SeaTunnel Zeta, however, does not require third-party systems. Its internal distributed memory grid can store job status information. Any node process that exits unexpectedly will redistribute the data in the memory grid, ensuring that the job can find the previous state when fault tolerance recovery is carried out on another node. This architecture has a potential issue: when the Master and Worker are together, if the cluster load is high and the active Master node process exits unexpectedly, fault tolerance will occur on a new node. During fault tolerance, because the Master node process exited unexpectedly, all tasks must be fault-tolerated again, which might lead to a high load on the new Master node’s Worker node and cause the new Master process to exit unexpectedly.

To solve this problem, we developed a new architecture that separates Master and Worker deployments. The Master only stores data and schedules tasks, while the Worker node only executes tasks and provides resources. This way, the node roles in the entire SeaTunnel are divided into Master, Worker, and master_and_worker. Users can use them according to their needs.

2. Support for Creating SeaTunnel Tasks Using SQL

The second key update is the support for creating SeaTunnel tasks using SQL. Previously, SeaTunnel tasks were created using the HOCON file format. Version 2.3.6 supports creating tasks using SQL. Users can create a Source table and a Sink table, and finally synchronize data from the Source table to the target table using the insert into statement.

3. Zeta CDC Sync Releases Idle Readers

The third update introduces the Zeta CDC sync feature to release idle readers. During the full synchronization phase, many readers are run in parallel to speed up data reading and writing. However, when parsing the binlog for incremental synchronization, reading can only be single-threaded because the binlog is ordered, and the order cannot be disrupted. At this stage, the initial readers and writers no longer have any data flow. In version 2.3.6, Apache SeaTunnel releases previous resources, including JDBC resources and memory resources, to occupy less space and support the running of larger tasks. For the issue of slow writing by a single writer, you can set the number of writer threads in the writer. This way, reading is single-threaded, while writing is multi-threaded and parallel. Since file parsing speed is fast, a single job can achieve a write speed of over thirty megabytes per second. If writing encounters difficulties, the writer side also supports setting parallelism.

4. Support for Event Notification Mechanism

Support for the event notification mechanism has been added. Using these APIs, events generated within the Zeta engine, such as job success or failure, or DDL changes, can be sent to other systems via requests.

5. Added Support for Vector Database Milvus

Vector databases can accelerate AI application development and simplify the workloads of AI-driven applications, becoming a valuable assistant in the era of large models. To better support AI development, Apache SeaTunnel 2.3.6 adds support for the vector database Milvus. This is the first vector database supported by Apache SeaTunnel, with plans to extend support to other vector databases in the future.

6. Support for Dynamically Compiled Transforms

Apache SeaTunnel 2.3.6 provides a programmable way to process rows, allowing users to customize any business behavior based on existing row fields as parameters, including RPC requests based on existing row fields or extending fields by retrieving related data from other data sources. To distinguish businesses, users can also define multiple transforms for combinations, making business scenarios more efficient and flexible.

For details, see Dynamic Compile Transform.

7. Resource Isolation

By adding tag labels to task nodes, cluster resources can be differentiated, helping users plan cluster task scheduling more reasonably.

For details on resource isolation and implementation methods, see Resource Isolation.

8. Unified Support for Table/Database Wildcards in Sinks

The new version also provides a wildcard feature for receiving options, allowing metadata of upstream tables to be dynamically obtained. This feature is crucial when users need to dynamically obtain upstream table metadata (such as multi-table writing), helping users achieve a more convenient and unified multi-table configuration method and reducing the difficulty of multi-table configuration.

See the documentation for how to use this feature: Sink Options Placeholders.

Others

Additionally, Apache SeaTunnel 2.3.6 implements user-defined parameter functions under the Spark/Flink engines, adds support for multiple connectors like Hudi Sink, updates Transform and Zeta Engine, and fixes documentation issues.

For details, see the Release Notes.

Acknowledgments

Thanks to @Hisoka-X for leading this release, and thanks to the following contributors for their support in this release (in no particular order):

Assert, Asura7969, Carl-Zhou-CN, ChunFuWu, Coen, CosmosNi, Dongyeon Lee, Eric, Felix, Feng Ruohang, FuYouJ, Guangdong Liu, JackeyLee007, Jarvis, Jast, Jia Fan, Kim, Leon Yoah, Marvin, THZ, TaoZex, TeAmo, Thomas-HuWei, Tyrantlucifer, Wenjun Ruan, Wudadada, XiaoMaYi, Xiaojian Sun, Xuzz, YalikWang, ZhiLin Li, Zhihong Pan, ZhilinLi, bingquanzhao, corgy-w, dailai, fcb-xiaobo, gitfortian, hailin0, halo.kim, hawk9821, hilo, ic4y, latch890727, lightzhao, litiliu, lizhenglei, ponxu, rtyuy, seckiller, tcodehuber, useheart, xiaochen, zhangdonghao, zhiwei liu, zuo, 老王, 不忘初心, 狂野之驴

About Apache SeaTunnel

Apache SeaTunnel is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day stably and efficiently.

Welcome to fill out this form to be a speaker of Apache SeaTunnel: https://forms.gle/vtpQS6ZuxqXMt6DT6 :)

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

Massive data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of Apache SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/seatunnel/issues

Contribute code to:

https://github.com/apache/seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Join us now!❤️❤️