Apache SeaTunnel(Incubating) 2.1.3 released! Introducing in Assert Sink connector and NullRate, Nulltf Transform

4 min readAug 12, 2022

More than a month after the release of Apache SeaTunnel(Incubating) 2.1.2, we have been collecting user and developer feedback to bring you version 2.1.3. The new version introduces the Assert Sink connector, which is an urgent need in the community, and two Transforms, NullRate and Nulltf. Some usability problems in the previous version have also been fixed, improving stability and efficiency.

This article will introduce the details of the update of Apache SeaTunnel(Incubating) version 2.1.3.

Release Note: https://github.com/apache/incubator-seatunnel/blob/2.1.3/release-note.md
Download address: https://seatunnel .apache.org/download

Major feature updates

Introduces Assert Sink connector

Assert Sink connector is introduced in Apache SeaTunnel version 2.1.3 to verify data correctness. Special thanks to Lhyundeadsoul for his contribution.

Add two Transforms

In addition, the 2.1.3 version also adds two Transforms, NullRate and Nulltf, which are used to detect data quality and convert null values in the data to generate default values. These two Transforms can effectively improve the availability of data and reduce the frequency of abnormal situations. Special thanks to wsyhj and Interest1-wyt for their contributions.

At present, Apache SeaTunnel has supported 9 types of Transforms including Common Options, Json, NullRate, Nulltf, Replace, Split, SQL, UDF, and UUID, and the community is welcome to contribute more transform types.

For details of Transform, please refer to the official documentation: https://seatunnel .apache.org/docs/2.1.3/category/transform

ClickhouseFile connector supports Rsync data transfer method now

At the same time, Apache SeaTunnel 2.1.3 version brings Rsync data transfer mode support to ClickhouseFile connector, users can now choose SCP and Rsync data transfer modes. Thanks to Emor-nj for contributing to this feature.

Specific feature updates:

Flink Fake data supports BigInteger type https://github.com/apache/incubator-seatunnel /pull/2118
Add Flink Assert Sink connector https://github.com/apache/incubator-seatunnel /pull/2022
Spark ClickhouseFile connector supports Rsync data file transfer method https://github.com/apache/incubator-seatunnel /pull/2074
Add Flink Assert Sink e2e module https://github.com/apache/incubator-seatunnel /pull/2036
Add NullRate Transform for detecting data quality https://github.com/apache/incubator-seatunnel /pull/1978
Added Nulltf Transform for setting defaults https://github.com/apache/incubator-Apache SeaTunnel /pull/1958

Optimization

Refactored Spark TiDB-related parameter information
Refactor the code to remove redundant code warning information
Optimize connector jar package loading logic
Add Plugin Discovery module
Add documentation for some modules
Upgrade common collection from version 4 to 4.4
Upgrade the common-codec version to 1.13

Bug Fix

In addition, in response to the feedback from users of version 2.1.2, we also fixed some usability issues, such as the inability to use the same components of Source and Sink, and further improved the stability.

Fixed the problem of Hudi Source loading twice
Fix the problem that the field TwoPhaseCommit is not recognized after Doris 0.15
Fixed abnormal data output when accessing Hive using Spark JDBC
Fix JDBC data loss when partition_column (partition mode) is set
Fix KafkaTableStream schema JSON parsing error
Fix Shell script getting APP_DIR path error
Updated Flink RunMode enumeration to get correct help messages for run modes
Fix the same source and sink registered connector cache error
Fix command line parameter -t( — check) conflict with Flink deployment target parameter
Fix Jackson type conversion error problem
Fix the problem of failure to run scripts in paths other than Apache SeaTunnel_Home

Acknowledgment

Thanks to all the contributors (GitHub ID, in no particular order,), it is your efforts that fuel the launch of this version, and we look forward to more contributions to the Apache SeaTunnel(Incubating) community!

leo65535, CalvinKirs, mans2singh, ashulin, wanghuan2054, lhyundeadsoul, tobezhou33, Hisoka-X, ic4y, wsyhj, Emor-nj, gleiyu, smallhibiscus, Bingz2, kezhenxu94, youyangkou, immustard, Interest1-wyt, superzhang0929, gaaraG, runwenjun

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

Massive data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of Apache SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Come and join us!