Apache SeaTunnel(Incubating) 2.1.3 released! Introducing in Assert Sink connector and NullRate, Nulltf Transform

Apache SeaTunnel
4 min readAug 12, 2022

--

More than a month after the release of Apache SeaTunnel(Incubating) 2.1.2, we have been collecting user and developer feedback to bring you version 2.1.3. The new version introduces the Assert Sink connector, which is an urgent need in the community, and two Transforms, NullRate and Nulltf. Some usability problems in the previous version have also been fixed, improving stability and efficiency.

This article will introduce the details of the update of Apache SeaTunnel(Incubating) version 2.1.3.

Major feature updates

Introduces Assert Sink connector

Assert Sink connector is introduced in Apache SeaTunnel version 2.1.3 to verify data correctness. Special thanks to Lhyundeadsoul for his contribution.

Add two Transforms

In addition, the 2.1.3 version also adds two Transforms, NullRate and Nulltf, which are used to detect data quality and convert null values ​​in the data to generate default values. These two Transforms can effectively improve the availability of data and reduce the frequency of abnormal situations. Special thanks to wsyhj and Interest1-wyt for their contributions.

At present, Apache SeaTunnel has supported 9 types of Transforms including Common Options, Json, NullRate, Nulltf, Replace, Split, SQL, UDF, and UUID, and the community is welcome to contribute more transform types.

For details of Transform, please refer to the official documentation: https://seatunnel.apache.org/docs/2.1.3/category/transform

ClickhouseFile connector supports Rsync data transfer method now

At the same time, Apache SeaTunnel 2.1.3 version brings Rsync data transfer mode support to ClickhouseFile connector, users can now choose SCP and Rsync data transfer modes. Thanks to Emor-nj for contributing to this feature.

Specific feature updates:

Optimization

  • Refactored Spark TiDB-related parameter information
  • Refactor the code to remove redundant code warning information
  • Optimize connector jar package loading logic
  • Add Plugin Discovery module
  • Add documentation for some modules
  • Upgrade common collection from version 4 to 4.4
  • Upgrade the common-codec version to 1.13

Bug Fix

In addition, in response to the feedback from users of version 2.1.2, we also fixed some usability issues, such as the inability to use the same components of Source and Sink, and further improved the stability.

  • Fixed the problem of Hudi Source loading twice
  • Fix the problem that the field TwoPhaseCommit is not recognized after Doris 0.15
  • Fixed abnormal data output when accessing Hive using Spark JDBC
  • Fix JDBC data loss when partition_column (partition mode) is set
  • Fix KafkaTableStream schema JSON parsing error
  • Fix Shell script getting APP_DIR path error
  • Updated Flink RunMode enumeration to get correct help messages for run modes
  • Fix the same source and sink registered connector cache error
  • Fix command line parameter -t( — check) conflict with Flink deployment target parameter
  • Fix Jackson type conversion error problem
  • Fix the problem of failure to run scripts in paths other than Apache SeaTunnel_Home

Acknowledgment

Thanks to all the contributors (GitHub ID, in no particular order,), it is your efforts that fuel the launch of this version, and we look forward to more contributions to the Apache SeaTunnel(Incubating) community!

leo65535, CalvinKirs, mans2singh, ashulin, wanghuan2054, lhyundeadsoul, tobezhou33, Hisoka-X, ic4y, wsyhj, Emor-nj, gleiyu, smallhibiscus, Bingz2, kezhenxu94, youyangkou, immustard, Interest1-wyt, superzhang0929, gaaraG, runwenjun

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

  • Data loss and duplication
  • Task buildup and latency
  • Low throughput
  • Long application-to-production cycle time
  • Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

  • Massive data synchronization
  • Massive data integration
  • ETL of large volumes of data
  • Massive data aggregation
  • Multi-source data processing

Features of Apache SeaTunnel

  • Rich components
  • High scalability
  • Easy to use
  • Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Come and join us!

--

--

Apache SeaTunnel
Apache SeaTunnel

Written by Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.

No responses yet