Apache SeaTunnel officially releases version 2.3.5, with enhanced functions and multiple bug fixes

Apache SeaTunnel
6 min readMay 13, 2024

--

After two months of preparation, we have iterated on version 2.3.4 with a new round of updates. This release not only fixes several critical issues but also introduces numerous significant enhancements and performance optimizations.

We would like to extend our gratitude to the community members for their contributions and support. If you are looking to upgrade to the latest version, let’s take a look at the highlights of this update!

Release Note: https://github.com/apache/seatunnel/releases/tag/2.3.5

Download link: https://seatunnel.apache.org/download/

Major New Features

Support for job event notification functions, including real-time CDC data delay event notifications (https://github.com/apache/seatunnel/pull/6634). Users can customize the target for sending notification messages, so if there is a delay in real-time synchronized data, users will be notified.

File-type connectors now support defining character encoding for reading and writing, which is useful when different character encodings are used at the source and destination.

Optimized the logic for creating Postgres CDC publications. Previously, the range of publications created when adding tables to Postgres CDC was ALL_TABLES, causing unnecessary PG WAL growth even when only a few tables were synchronized. The community has optimized the logic to create publications only for the necessary tables, significantly reducing WAL growth and improving stability.

Zeta engine now supports setting the number of retry attempts for tasks. Previously, Zeta engine tasks would automatically retry three times upon failure. In some scenarios, we may prefer tasks to terminate immediately upon failure, with retries managed by an external scheduling system. From version 2.3.5 onwards, users can disable automatic retries by setting job.retry.times = 0 in the env.

Key Bug Fixes

Zeta engine now supports the classloader cache feature. If a connector has already been loaded into the JVM, Zeta engine will cache that classloader. Subsequent tasks submitted for that connector will use the cached classloader instead of creating a new one. This approach resolves the issue of Zeta JVM metaspace memory growth when submitting a large number of tasks, thereby fixing the metaspace memory overflow bug.

Fixed the issue of precision loss in SQL Transform. For example, fields of type timestamp can now retain their previous precision information.

Bug Fix

Core

  • [fix] Fixed null pointer exception issue in SeaTunnel (#6681)
  • [Hotfix] Resolved default table issue (#6352)
  • [Chore] Corrected file spelling errors (#6606)
  • [BugFix][Spark-translation] Fixed type conversion error in mapping (#6552)
  • [Hotfix] Fixed Spark example issue (#6486)
  • [Hotfix] Resolved compilation errors (#6463)

Transformer

  • [Fix][SQLTransform] Fixed precision loss issue in SQL transform (#6553)
  • [Bug] Fixed negative constant error in SQLTransform (#6533)

Connectors

  • [Fix][Kafka-Sink] Fixed Kafka Sink option rule (#6657)
  • [Hotfix] Fixed HTTP source reading ‘yyyy-MM-dd HH:mm:ss’ format and optimized date-time tool (#6601)
  • [Bug] Fixed issues with OrcWriteStrategy/ParquetWriteStrategy when logging in with Kerberos (#6472)
  • [Fix][Doc] Corrected ‘username’ key to ‘user’ in FTP Sink configuration (#6627)
  • [E2E] Fixed instability issues in Amazon DynamoDB integration tests (#6640)
  • [Fix][Connector-V2] Fixed issue with adding Hive partition when partition already exists (#6577)
  • [Fix][Connector-V2] Corrected SQL parsing error in Doris/StarRocks table creation (#6580)
  • [Fix][Connector-V2] Fixed issue where Doris Sink could not close when no data was read during stream loading (#6570)
  • [Fix][Connector-V2] Fixed issue where connectors supported SPI but lacked no-arg constructors (#6551)
  • [Fix][Connector-V2] Fixed issue where Doris source lost primary key information when selecting fields (#6339)
  • [Fix][FakeSource] Fixed issue where template random generation did not include latest values (#6438)
  • [Fix][Connector-V2] Fixed issue with MongoDB CDC start mode option values (#6338)
  • [BugFix][Connector-file-sftp] Fixed issue where SFTPInputStream.close did not correctly trigger file stream closure (#6323) (#6329)
  • [Fix] Fixed issue where Doris stream loading failed without reporting errors (#6315)
  • [fix][connector-rocketmq] Fixed null pointer exception caused by setting checkpoint.interval too low (#6624)
  • [Bugfix][TDengine] Fixed issue where driver was lost due to multiple calls to submit job REST API (#6581) (#6596)
  • [Fix][StarRocks] Fixed null pointer exception when catalogtable table path only had table name part upstream (#6540)

Formats

  • [Bug] [formats] Fixed issue where rows could not be parsed when content included file separators (#6589)

Zeta(ST-Engine)

  • [Hotfix] Fixed issue with HTTP source reading ‘yyyy-MM-dd HH:mm:ss’ format and optimized date-time tool (#6601)
  • [Fix][Zeta] Fixed thread hang issue caused by savepoint check mechanism (#6568)
  • [Fix][Zeta] Improved Hazelcast connection in local mode (#6521)
  • [Fix][Zeta] Fixed issue where thread classloader was set to null when using cache mode (#6509)
  • [Bug] [zeta] Fixed null pointer exception when submitting job (#6492)
  • [bugfix] [Zeta] Fixed issue where classloader was not released when submitting job via REST API
  • [BUG][Zeta] Fixed incorrect job name display (#6470)
  • [Hotfix][Zeta] Fixed job deadlock issue when changing mode (#6389)

E2E

  • [E2E] Enabled StarRocksCDCSinkIT (#6626)

Improve

  • [Doc][Improve] Added Chinese support for seatunnel-engine (#6656)
  • [Doc][Improve] Added Chinese support for start-v2/locally/quick-start-flink.md and start-v2/locally/quick-start-spark.md (#6412)
  • [Improve] Added icons for IDEA (#6394)
  • [Improve] Added deprecated comments for ReadonlyConfig::toConfig (#6353)
  • [Improve][RestAPI] Always return jobId when calling getJobInfoById API (#6422)
  • [Improve][RestAPI] Return completed job information when job is completed (#6576)
  • [Improve] Improved MultiTableSinkWriter prepare commit performance (#6495)
  • [Improve] Added detailed logs for save mode handling (#6375)
  • [Improve][API] Unified data and type system APIs (#5872)
  • [Improve] Optimized type conversion errors when reading with Parquet (#6683)
  • [Improve][Connector-V2] Added support for multi-table export in Redis (#6314)
  • [Improve][Connector-V2] Optimized Oracle CDC end-to-end testing (#6232)
  • [Improve][Connector-V2] Added multi-table support for HTTP Sink (#6316)
  • [Improve][Connector-V2] Added support for INFINI Easysearch (#5933)
  • [Improve][Connector-V2] Added support for Paimon Sink with Hadoop HA and Kerberos (#6585)
  • [Improve][CDC-Connector] Fixed CDC option rule (#6454)
  • [Improve][CDC] Optimized memory allocation when reading snapshot splits (#6281)
  • [Improve][Connector-V2] Added support for TableSourceFactory on StarRocks (#6498)
  • [Improve][Jdbc] Stored strings in Oracle using varchar2 data type (#6392)
  • [Improve] StarRocksSourceReader uses existing client (#6480)
  • [Improve][JDBC] Optimized code style for retrieving JDBC field types (#6583)
  • [Improve][Connector-V2] Added ElasticSearch type converter (#6546)
  • [Improve][Connector-V2] Supported reading ORC with schema configuration and type conversion (#6531)
  • [Improve][Jdbc] Added custom case-sensitive configuration for large databases (#6510)
  • [Improve][Jdbc] Added type converter when creating tables automatically (#6617)
  • [Improve][CDC] Optimized split state memory allocation in incremental stage (#6554)
  • [Improve][CDC] Improved performance when schema fields are not included in records (#6571)
  • [Improve][Jdbc] Added reference identifier for SQL (#6669)
  • [Improve] Disabled 2PC on SelectDB cloud Sink (#6266)
  • [Doc][Improve] Added Kerberos authentication support for Kafka connectors (#6653)

CI

  • [CI] Fixed repository name error in CI configuration files (#4795)

Zeta(ST-Engine)

  • [Improve][Zeta] Added classloader cache mode to fix metaspace leakage (#6355)
  • [Improve][Test] Fixed instability issues in ResourceManager and EventReport module tests (#6620)
  • [Improve][Test] Run all tests when merging code to the development branch (#6609)
  • [Improve][Test] Made classloader cache tests more stable (#6597)
  • [Improve][Zeta][storage] Updated HDFS configuration to support more parameters (#6547)
  • [Improve][Zeta] Optimized logic for RestHttpGetCommandProcessor#getSeaTunnelServer() (#6666)

Transformer

  • [Improve][Transform] SQL transform supports internal structure query (#6484)
  • [Improve][Transform] Removed fallback operations during transform parsing (#6644)
  • [Improve][Transform] Removed exception for missing fields (#6691)

Feature

  • [Feature][Tool] Added connector check script to resolve issue #6199 (#6635)
  • [Feature][Core] Added support for listening to message delay events in CDC source (#6634)
  • [Feature][Core] Added support for job event listening (#6419)
  • [Feature][connector-v2] Added XuguDB connector (#6561)
  • [Feature][Connector-V2] Added multi-table export functionality for Paimon (#6449)
  • [Feature][Connectors-V2][File] Added support for specifying encoding for file sources/sinks (#6489)
  • [Feature][Connector] Updated PgSQL-CDC publication to add tables (#6309)
  • [Feature][Paimon] Supported specifying Paimon table write attributes, partition keys, and primary keys (#6535)
  • [Feature][Feature] Supported Doris DateTimeV2 type (#6358)
  • [Feature][Feature] Supported SelectDB DateTimeV2 type (#6332)
  • [Feature][Feature] Added support for Iceberg Sink connector (#6265)

Zeta(ST-Engine)

  • [Zeta] Added support for setting job retry times in job configuration (#6690)

Docs

  • [Docs] Fixed spelling error in Kafka format (#6633)
  • [Fix][Doc] Corrected some document links (#6673)
  • [Fix][Doc] Corrected some typos (#6628)
  • [Fix][Doc] Corrected formatting errors in StarRocks Sink documentation (#6579)
  • [Hotfix][Doc][Chinese] Fixed invalid links related to log configuration parameters (#6442)
  • [Fix][Doc] Corrected errors in Seatunnel Engine/checkpoint-storage.md documentation (#6369)

List of Contributors

We would like to thank all community members who contributed to version 2.3.5, including code contributors, document writers, and testers. The success of Apache SeaTunnel would not be possible without everyone’s efforts!

JetiaimeLeonYoahTyrantLuciferponxuEricJoy2048sunxiaojianxiaochen-zhouCosmosNilightzhaobaicieHisoka-Xgitfortianhailin0ruanwenjunshangeyaocorgy-wliunaijiedailaitaohaozhi1129LeonYoahnianhua99xxzuoYalikWang

--

--

Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.