Major Release! Apache SeaTunnel(Incubating) 2.3.0-beta supports the self-innovate Apache SeaTunnel Engine and more connectors!
Apache SeaTunnel(Incubating) 2.3.0-beta is officially released recently. In the new version, the long-awaited Apache SeaTunnel self-developed data synchronization engine — Apache SeaTunnel Engine debuted for the first time. In addition, the new version supports more connectors, and fixes bugs for the early supported connectors.
This article will introduce the details of Apache SeaTunnel(Incubating) 2.3.0-beta version update.
- Release Note:
https://github.com/apache/incubator-seatunnel/blob/2.3.0-beta/release-note.md
- Download:
https://seatunnel.apache.org/download/
- Quick Start Documentation:
https://seatunnel.apache.org/docs/category/start
Apache SeaTunnel Engine released!
In the 2.0.3 beta version of Apache SeaTunnel, the community-developed data synchronization engine designed for data synchronization scenarios debuts. As the default engine of Apache SeaTunnel, it supports high-throughput, low-latency, and strong-consistent synchronous job operation, which is faster, more stable, more resource-saving, and easy to use.
The overall design of the Apache SeaTunnel Engine follows the path below:
- Faster, Apache SeaTunnel Engine’s execution plan optimizer aims to reduce data network transmission, thereby reducing the loss of overall synchronization performance caused by data serialization and de-serialization, allowing users to complete data synchronization operations faster. At the same time, a speed limit is supported to synchronize data at a reasonable speed.
- More stable, Apache SeaTunnel Engine uses Pipeline as the minimum granularity of checkpoint and fault tolerance for data synchronization tasks. The failure of a task will only affect its upstream and downstream tasks, which avoids task failures that cause the entire job to fail or roll back. At the same time, Apache SeaTunnel Engine also supports data cache for scenarios where the source data has a storage time limit. When the cache is enabled, the data read from the source will be automatically cached, then read by the downstream task and written to the target. Under this condition, even if the data cannot be written due to the failure of the target, it will not affect the regular reading of the source, preventing the data from the source is deleted when expired.
- Space-saving, Apache SeaTunnel Engine uses Dynamic Thread Sharing technology internally. In the real-time synchronization scenario, for the tables with a large amount but small data sizes per table, Apache SeaTunnel Engine will run these synchronization tasks in shared threads to reduce unnecessary thread creation and save system space. On the reading and data writing side, the design goal of Apache SeaTunnel Engine is to minimize the amount of JDBC connections; in CDC scenarios, Apache SeaTunnel Engine will reuse log reading and parsing resources.
- Simple and easy to use, Apache SeaTunnel Engine reduces the dependence on third-party services and can implement cluster management, snapshot storage, and cluster HA functions independently of big data components such as Zookeeper and HDFS. This is very useful for users who currently lack a big data platform, or are unwilling to rely on a big data platform for data synchronization.
In the future, Apache SeaTunnel Engine will further optimize its functions to support full synchronization and incremental synchronization of offline batch synchronization, real-time synchronization, and CDC.
New features
【Basic functions of Apache SeaTunnel Engine】
2.3.0-beta is the debut release version of Apache SeaTunnel Engine, which implements some basic functions, details please refer to: https://github.com/apache/incubator-seatunnel/issues/2272
【Cluster Management】
- Support stand-alone operation;
- Support cluster operation;
- Support autonomous cluster (decentralized), which saves the users from specifying a master node for the Apache SeaTunnel Engine cluster, because it can select a master node by itself during operation, and a new master node will be chosen automatically when the master node fails.
- Autonomous Cluster nodes-discovery and nodes with the same cluster_name will automatically form a cluster.
【Core functions】
- Supports running jobs in local mode, and the cluster is automatically destroyed after the job once completed;
- Supports running jobs in Cluster mode (single machine or cluster), submitting jobs to the Apache SeaTunnel Engine service through the Apache SeaTunnel Client, and the service continues to run after the job is completed and waits for the next job submission;
- Support offline batch synchronization;
- Support real-time synchronization;
- Batch-stream integration, all Apache SeaTunnel V2 connectors can run in Apache SeaTunnel Engine;
- Supports distributed snapshot algorithm, and supports two-stage submission with Apache SeaTunnel V2 connector, ensuring that data is executed only once.
- Support job invocation at the Pipeline level to ensure that it can be started even when resources are limited;
- Supports fault tolerance for jobs at the Pipeline level. Task failure only affects the Pipeline where it is located, and only the task under the Pipeline needs to be rolled back;
- Support dynamic thread sharing to synchronize a large number of small data sets in real time.
Connector update
Connector newly-added
With the joint efforts of the community, the 2.3.0-beta version has introduced 10 more connectors, including:
Connector optimization
- [Source] [Fake]
- [Improve] Supports direct definition of data values(row) (2839)
- [Improve] Improve fake source connector: (2944)
- Support user-defined map size
- Support user-defined array size
- Support user-defined string length
- Support user-defined bytes length
- [Improve] Support multiple splits for fake source connector (2974)
- [Improve] Supports setting the number of splits per parallelism and the reading interval between two splits (3098)
- [Source] [ClickHouse]
- [Improve] ClickHouse Source random use host when config multi-host (3108)
- [Source] [FtpFile]
- [Improve] Support to extract partition from Apache SeaTunnelRow fields (3085)
- [Improve] Support parse field from the file path (2985)
- [Source] [HDFSFile]
- [Improve] Support to extract partition from Apache SeaTunnelRow fields (3085)
- [Improve] Support parse field from the file path (2985)
- [Source] [LocalFile]
- [Improve] Support to extract partition from Apache SeaTunnelRow fields (3085)
- [Improve] Support parse field from the file path (2985)
- [Source] [OSSFile]
- [Improve] Support to extract partition from Apache SeaTunnelRow fields (3085)
- [Improve] Support parse field from the file path (2985)
- [Source] [IoTDB]
- [Improve] Improve IoTDB Source Connector (2917)
- Support extract timestamp、device、measurement from Apache SeaTunnelRow
- Support TINYINT、SMALLINT
- Support flush cache to the database before prepareCommit
- [Sink] [Assert]
- [Improve] 1. Support check the number of rows (2844) (3031):
- check rows that are not empty
- check the minimum number of rows
- check the maximum number of rows
- [Improve] 2. Support direct define of data values(row) (2844) (3031)
- [Improve] 3. Support setting parallelism as 1 (2844) (3031)
- [Sink] [ClickHouse]
- [Improve] ClickHouse Support Int128,Int256 Type (3067)
- [Sink] [Console]
- [Improve] Console sink support print subtask index (3000)
- [Sink] [IoTDB]
- [Improve] Improve IoTDB Sink Connector (2917)
- Support align by SQL syntax
- Support SQL split ignore case
- Support restore split offset to at-least-once
- Support read timestamp from RowRecord
- [Sink] [Kudu]
- [Improve] Kudu Sink Connector Support to upsert row (2881)
Connector Bug fixes
- [Source] [FtpFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [Source] [HDFSFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [Source] [LocalFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [Source] [OSSFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [Sink] [Enterprise-WeChat]
- [BugFix] Fix Enterprise-WeChat Sink data serialization (2856)
- [Sink] [FtpFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [BugFix] Fix filesystem get an error (3117)
- [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
- [Sink] [HDFSFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [BugFix] Fix filesystem gets an error (3117)
- [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
- [Sink] [LocalFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [BugFix] Fix filesystem gets an error (3117)
- [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
- [Sink] [OSSFile]
- [BugFix] Fix the bug of the incorrect path in the Windows environment (2980)
- [BugFix] Fix filesystem gets an error (3117)
- [BugFix] Solved the bug of can not parse ‘\t’ as delimiter from the config file (3083)
- [Sink] [IoTDB]
- [BugFix] Fix IoTDB connector sink NPE (3080)
- [Sink] [JDBC]
- [BugFix] Fix JDBC split exception (2904)
Connector V1 update
- [Sink] [Spark-Hbase]
- [BugFix] Handling null values (3099)
Other updates
Feature optimization and update
- [Improve] [Sink] Support define parallelism for sink connector (2941)
- [Improve] [all] change Log to @slf4j (3001)
- [Improve] [format] [text] Support read & write Apache SeaTunnelRow type (2969)
- [Improve] [api] [flink] extraction unified method (2862)
- [Feature] [deploy] Add Helm charts (2903)
- [Feature] [Apache SeaTunnel-text-format] (2884)
Bug fixes
- [BugFix] Fix the Assert connector name error in the config/plugin_config file (3127)
- [BugFix] [starter] Fix connector-v2 flink & spark dockerfile (3007)
- [BugFix] [core] Fix spark engine parallelism parameter does not work (2965)
- [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
- [BugFix] [format] [json] Fix Jason package conflict with spark (2934)
- [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
- [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
- [BugFix] [Apache SeaTunnel-translation-base] Fix Source restore state NPE (2878)
Documentation update
- Add the coding guide (2995)
Acknowledgment
Thanks to all the community members who have participated in the 2.3.0- beta version release work, your effort will make Apache SeaTunnel more and more powerful! Here is the list of the contributors to the release(alphabetically by GitHub ID):
About Apache SeaTunnel
Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.
Why do we need Apache SeaTunnel?
Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.
- Data loss and duplication
- Task buildup and latency
- Low throughput
- Long application-to-production cycle time
- Lack of application status monitoring
Apache SeaTunnel Usage Scenarios
- Massive data synchronization
- Massive data integration
- ETL of large volumes of data
- Massive data aggregation
- Multi-source data processing
Features of Apache SeaTunnel
- Rich components
- High scalability
- Easy to use
- Mature and stable
How to get started with Apache SeaTunnel quickly?
Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.
https://seatunnel.apache.org/docs/2.1.0/developement/setup
How can I contribute?
We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!
Submit an issue:
https://github.com/apache/incubator-seatunnel/issues
Contribute code to:
https://github.com/apache/incubator-seatunnel/pulls
Subscribe to the community development mailing list :
dev-subscribe@seatunnel.apache.org
Development Mailing List :
dev@seatunnel.apache.org
Join Slack:
https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ
Follow Twitter:
https://twitter.com/ASFSeaTunnel
Come and join us!