Apache SeaTunnel(Incubating) 2.1.3 released! Introducing in Assert Sink connector and NullRate, Nulltf Transform
More than a month after the release of Apache SeaTunnel(Incubating) 2.1.2, we have been collecting user and developer feedback to bring you version 2.1.3. The new version introduces the Assert Sink connector, which is an urgent need in the community, and two Transforms, NullRate and Nulltf. Some usability problems in the previous version have also been fixed, improving stability and efficiency.
This article will introduce the details of the update of Apache SeaTunnel(Incubating) version 2.1.3.
- Release Note: https://github.com/apache/incubator-seatunnel/blob/2.1.3/release-note.md
- Download address: https://seatunnel.apache.org/download
Major feature updates
Introduces Assert Sink connector
Assert Sink connector is introduced in Apache SeaTunnel version 2.1.3 to verify data correctness. Special thanks to Lhyundeadsoul for his contribution.
Add two Transforms
In addition, the 2.1.3 version also adds two Transforms, NullRate and Nulltf, which are used to detect data quality and convert null values in the data to generate default values. These two Transforms can effectively improve the availability of data and reduce the frequency of abnormal situations. Special thanks to wsyhj and Interest1-wyt for their contributions.
At present, Apache SeaTunnel has supported 9 types of Transforms including Common Options, Json, NullRate, Nulltf, Replace, Split, SQL, UDF, and UUID, and the community is welcome to contribute more transform types.
For details of Transform, please refer to the official documentation: https://seatunnel.apache.org/docs/2.1.3/category/transform
ClickhouseFile connector supports Rsync data transfer method now
At the same time, Apache SeaTunnel 2.1.3 version brings Rsync data transfer mode support to ClickhouseFile connector, users can now choose SCP and Rsync data transfer modes. Thanks to Emor-nj for contributing to this feature.
Specific feature updates:
- Flink Fake data supports BigInteger type https://github.com/apache/incubator-seatunnel/pull/2118
- Add Flink Assert Sink connector https://github.com/apache/incubator-seatunnel/pull/2022
- Spark ClickhouseFile connector supports Rsync data file transfer method https://github.com/apache/incubator-seatunnel/pull/2074
- Add Flink Assert Sink e2e module https://github.com/apache/incubator-seatunnel/pull/2036
- Add NullRate Transform for detecting data quality https://github.com/apache/incubator-seatunnel/pull/1978
- Added Nulltf Transform for setting defaults https://github.com/apache/incubator-Apache SeaTunnel/pull/1958
Optimization
- Refactored Spark TiDB-related parameter information
- Refactor the code to remove redundant code warning information
- Optimize connector jar package loading logic
- Add Plugin Discovery module
- Add documentation for some modules
- Upgrade common collection from version 4 to 4.4
- Upgrade the common-codec version to 1.13
Bug Fix
In addition, in response to the feedback from users of version 2.1.2, we also fixed some usability issues, such as the inability to use the same components of Source and Sink, and further improved the stability.
- Fixed the problem of Hudi Source loading twice
- Fix the problem that the field TwoPhaseCommit is not recognized after Doris 0.15
- Fixed abnormal data output when accessing Hive using Spark JDBC
- Fix JDBC data loss when partition_column (partition mode) is set
- Fix KafkaTableStream schema JSON parsing error
- Fix Shell script getting APP_DIR path error
- Updated Flink RunMode enumeration to get correct help messages for run modes
- Fix the same source and sink registered connector cache error
- Fix command line parameter -t( — check) conflict with Flink deployment target parameter
- Fix Jackson type conversion error problem
- Fix the problem of failure to run scripts in paths other than Apache SeaTunnel_Home
Acknowledgment
Thanks to all the contributors (GitHub ID, in no particular order,), it is your efforts that fuel the launch of this version, and we look forward to more contributions to the Apache SeaTunnel(Incubating) community!
leo65535, CalvinKirs, mans2singh, ashulin, wanghuan2054, lhyundeadsoul, tobezhou33, Hisoka-X, ic4y, wsyhj, Emor-nj, gleiyu, smallhibiscus, Bingz2, kezhenxu94, youyangkou, immustard, Interest1-wyt, superzhang0929, gaaraG, runwenjun
About Apache SeaTunnel
Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.
Why do we need Apache SeaTunnel?
Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.
- Data loss and duplication
- Task buildup and latency
- Low throughput
- Long application-to-production cycle time
- Lack of application status monitoring
Apache SeaTunnel Usage Scenarios
- Massive data synchronization
- Massive data integration
- ETL of large volumes of data
- Massive data aggregation
- Multi-source data processing
Features of Apache SeaTunnel
- Rich components
- High scalability
- Easy to use
- Mature and stable
How to get started with Apache SeaTunnel quickly?
Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.
https://seatunnel.apache.org/docs/2.1.0/developement/setup
How can I contribute?
We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!
Submit an issue:
https://github.com/apache/incubator-seatunnel/issues
Contribute code to:
https://github.com/apache/incubator-seatunnel/pulls
Subscribe to the community development mailing list :
dev-subscribe@seatunnel.apache.org
Development Mailing List :
dev@seatunnel.apache.org
Join Slack:
https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ
Follow Twitter:
https://twitter.com/ASFSeaTunnel
Come and join us!