Exciting Updates Coming in Apache SeaTunnel 2.3.8

Apache SeaTunnel
6 min readSep 27, 2024

--

Apache SeaTunnel 2.3.8 is set to be released soon, and recently, Apache SeaTunnel PMC Member Fan Jia shared insights on the new features and updates at a community meeting. Here’s a detailed overview of what to expect:

Introduction to SeaTunnel

SeaTunnel is a high-performance open-source distributed data integration system that supports real-time streaming and offline batch processing of various data sources, making it suitable for massive data integration. Key features include:

  • Extensive Connectors: Supports over 100 data sources and storage systems.
  • Multi-Engine Support: Compatible with various data processing engines, including SeaTunnel Zeta Engine, Spark, and Flink.
  • HTTP Support: Enables data integration via HTTP interfaces.
  • Stream and Batch Integration: Supports both stream processing and batch processing.
  • Stream Rate Control: Capable of controlling the rate of data flow.
  • Automatic Table Creation: Automatically creates tables based on data structure.

New Features and Updates in Version 2.3.8

In the upcoming 2.3.8 release, the community will introduce several new features and updates:

Docker Images

The new version will provide official Docker images that include nearly all connectors. Users can run SeaTunnel more quickly and simplify deployment without downloading installation packages.

  • Build Images via Command: Users with custom needs can build images locally using command-line instructions.
  • Start Services via Command: Supports starting services for distributed deployment, submitting tasks, and querying task statuses via the command line. Users can also submit tasks through REST APIs.
  • Submit tasks via the command:

Spark Multi-Table Support

Currently, SeaTunnel only supports multi-table tasks with the Zeta Engine. The new version will introduce Spark engine support for multi-table tasks, allowing for automatic recognition and execution of multi-table jobs. Additionally, Flink’s multi-table support is in progress, and interested contributors are welcome to join on GitHub.

Config Parameter Default Values

The current version allows variable configuration in the config parameters, but each variable needs to be set manually. The new version will permit the use of default values for configuration parameters, enhancing flexibility.

Prometheus Integration for Cluster Monitoring

Previously, SeaTunnel provided interfaces for retrieving task run metrics. The new version will support integration with Prometheus for cluster monitoring. Prometheus will regularly pull the status of SeaTunnel cluster tasks and present this in a visual interface, making it easier to monitor cluster status and quickly identify issues.

Embedding Transform

The addition of the Embedding transform will enable the integration of machine learning models into the data transformation process, converting raw fields into vector values for storage in appropriate machine learning databases. Current machine learning model providers supported by SeaTunnel include Doubao, Qianfan, and OpenAI.

Job-Level Log Filtering

The new version will enhance log filtering and viewing capabilities at the job level, enabling users to filter logs through two methods:

  1. Job ID in Logs: Users can search for logs associated with a specific Job ID, making it easier to troubleshoot when multiple tasks are running concurrently.

2. Splitting Logs by Job ID: By modifying the log configuration file, users can ensure that logs for the same Job ID are categorized into the same file, simplifying log management.

Example modification for log4j2.propertiesconfiguration file:

...
rootLogger.appenderRef.file.ref = routingAppender
...
appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%-30.30c{1.}] [%t] - %m%n
...

Kafka Support for Protobuf Data

The Kafka connector has been enhanced to support Protobuf data format, allowing for the definition of Protobuf data types for reading and writing.

File Support for Reading Compressed Files

The new version will introduce support for reading compressed file formats, eliminating the need for decompression steps.

Other Features

Additionally, the new version will remove filters on system tables, allowing users to read system tables, and enhance support for Paimon’s stream reading and dynamic bucket writing.

How to Get the Latest Version and Contribute

Download

The SeaTunnel 2.3.8 version is expected to be released in early October. Stay tuned to the SeaTunnel official download page for the latest version.

Contributing

  • Mailing List: Subscribe to the SeaTunnel development mailing list by emailing dev-subscribe@seatunnel.apache.org to participate in community discussions and release votes.
  • GitHub: Visit the Apache SeaTunnel GitHub repository to keep up with community updates and submit bug reports and feature requests.

Conclusion

The release of SeaTunnel 2.3.8 will introduce a series of new features and improvements, making data integration more efficient and flexible. Thanks to all contributors for their efforts in making SeaTunnel a more powerful data integration tool.

For more information, please visit the SeaTunnel official website.

About Apache SeaTunnel

Apache SeaTunnel is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day stably and efficiently.

Welcome to fill out this form to be a speaker of Apache SeaTunnel: https://forms.gle/vtpQS6ZuxqXMt6DT6 :)

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

  • Data loss and duplication
  • Task buildup and latency
  • Low throughput
  • Long application-to-production cycle time
  • Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

  • Massive data synchronization
  • Massive data integration
  • ETL of large volumes of data
  • Massive data aggregation
  • Multi-source data processing

Features of Apache SeaTunnel

  • Rich components
  • High scalability
  • Easy to use
  • Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/seatunnel/issues

Contribute code to:

https://github.com/apache/seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Join us now!❤️❤️

--

--

Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.