[Issue 6]Apache SeaTunnel Weekly FAQ

Apache SeaTunnel
4 min readMay 25, 2023

--

May 3rd-May 12th

Q: When using MySQL CDC to synchronize data to MySQL with SeaTunnel, if the SeaTunnel task is stopped and there are deletion operations on the source table, how to resolve the issue of deleted operations not being synchronized when restarting the SeaTunnel task?

A: Using savepoint to stop and reSeaTunnelore the job with status will start the job with the command parameters specified in bin/SeaTunnel.sh.

Q: Can SeaTunnel’s Transformer perform stream splitting and merging?

A: SeaTunnel does not require Transform for stream splitting. You can achieve stream splitting by configuring two Sinks with the same source_table_name pointing to the result_table_name of a single Source.

Q: When is the planned release for SeaTunnel Web?

A: You can expect to see it in the dev version by the end of May.

Q: In the scenario where a MySQL server crashes during data import, can SeaTunnel support resuming the data transfer after the server restart?

A: This feature is called checkpoint resumption in SeaTunnel. It does not require any special configuration. If a job supports checkpoint resumption, you can use the command sh seatunnel.sh -c ${path_to_jobconfig_file} -r ${jobId} to resume the job after a failure. To determine if a job supports checkpoint resumption, you can refer to the official documentation and check if the connectors support the exactly-once feature. If they do, checkpoint resumption is possible (Note: Both the Source and Sink connectors need to support exactly-once features).

Q: Where can I find examples of checkpoint resumption?

A: Checkpoint resumption does not require a specific configuration. As long as the source supports exactly-once and the sink supports exactly-once or deduplication based on the primary key, the job can be resumed. Simply use the sh seatunnel.sh -c -r command to restore the failed job.

Q: Is Hadoop a mandatory dependency for SeaTunnel?

If you are using the SeaTunnel Zeta engine, it does not depend on Hadoop.

Q: How does the recovery and continuation of jobs work with the built-in Zeta engine in SeaTunnel?

This feature is only supported in SeaTunnel standalone mode, not in local mode.

Q: Is it possible to support Chinese ShenTong databases in SeaTunnel?

A: Currently, there is no support, but you can develop a corresponding database plugin. Contributions are welcome.

Q: How can the SeaTunnel web interface be integrated with the DolphinScheduler web interface?

A: The integration between SeaTunnel web and DolphinScheduler web is currently under design as SeaTunnel web has not been released yet.

Q: Does the SeaTunnel web interface support alert notifications through enterprise messaging platforms like WeChat Work and Feishu? Can it display corresponding logs, read/write speed, progress, etc.?

A: The SeaTunnel web interface is primarily focused on task definition and simple task execution. For production-level scheduling, execution, and task alerts, it needs to be integrated with the corresponding scheduling system.

Q: Can multiple local mode SeaTunnel instances run simultaneously? In my testing, I encountered an issue when running data writing from a fake source to MySQL after performing MySQL CDC synchronization to ClickHouse, where the error “Hazelcast cannot SeaTunnelart” occurred.

A: In version 2.3.1, you can resolve this issue by modifying the hazelcast.yaml file and setting the port increment parameter to true. The upcoming dev branch and version 2.3.2 will also address this issue.

Q: Can SeaTunnel v1 use Spark 3? We have some requirements for HBase integration. If we use SeaTunnel v2 directly, many advanced Spark APIs cannot be used due to the SeaTunnel Row framework.

A: The SeaTunnel community is currently focusing on the V2 version of connectors and no new features are being added to the V1 version.

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

  • Data loss and duplication
  • Task buildup and latency
  • Low throughput
  • Long application-to-production cycle time
  • Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

  • Massive data synchronization
  • Massive data integration
  • ETL of large volumes of data
  • Massive data aggregation
  • Multi-source data processing

Features of Apache SeaTunnel

  • Rich components
  • High scalability
  • Easy to use
  • Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/seatunnel/issues

Contribute code to:

https://github.com/apache/seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Join us now!❤️❤️

--

--

Apache SeaTunnel
Apache SeaTunnel

Written by Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.

No responses yet