[Weekly FAQ] Issue 1 | Answer your questions about Apache SeaTunnel(Incubating)

4 min readApr 10, 2023

To make the users and enthusiasts of Apache SeaTunnel get timely and quick answers to their questions about the project, the community specially launched this [Weekly FAQ] column, hoping to solve your problems in practice.

Q: Does Apache SeaTunnel require the Spark version to be at least 2.4?
A: Apache SeaTunnel 2.1.x can support Spark2.3.

Q: Does Apache SeaTunnel need to add any additional dependent jar packages when operating Hive data?
A: https://seatunnel.apache.org/docs/2.3.1/connector
It is stated in v2/source/Hive#description that if the Spark/Flink engine is used, Spark/Flink must be integrated with Hive.

If you use the Apache SeaTunnel Zeta engine, you need to put seatunnel-hadoop3–3.1.4-uber.jar and hive-exec-2.3.9.jar in the $SEATUNNEL_HOME/lib/ directory.

Q: Does Apache SeaTunnel support the collection of pictures, audio, or video?
A: Not supported yet

Q: Under DolphinSchedulercdc mode, data under 20,000 entries can not be written, is this true?
A: You can upgrade DolphinScheduler to solve the problem, becausethe source_table_name is not set in the sink.

Q: Can DolphinScheduler only integrate Apache SeaTunnel+2.1.X version? Is there any official documentation or tutorial?
A: The community has already provided a PR to support the latest Apache SeaTunnel version, waiting to be merged.

Q: How does Apache SeaTunnel set the memory size occupied?
A: There are jvm_options files and jvm_client_options files in the $SeaTunnel_HOME/config directory. The jvm_options file controls the memory of the process in Zeta Cluster mode. jvm_client_options controls the memory of the client that submits the job and the job in local mode.

Q: How to write the code to use Apache SeaTunnel to complete data synchronization from HDFS to MySQL?
A: Refer to the official document HDFS Source and JDBC Sink connector: https://seatunnel.apache.org/docs/2.3.1/about.

Q: After executing sh install-pulgins, I run the demo provided by the official, but still report an error that the plugin cannot be found, what’s wrong with that?
A: Check whether there is a corresponding plug-in jar package in the $SeaTunnel_HOME/connectors/SeaTunnel directory

Q: Clean up the data before the sink is similar to making a truncate table, and how to configure the sink?

A: This is not supported yet, and we need to wait for the community SaveMode feature to be completed.

Q: When Apache SeaTunnel uses the zeta engine -i to pass parameters, what should be written in the configuration file?
A: Now Zeta does not support -i for the time being.

Q: Does Apache SeaTunnel support Doris to MySQL?
A: Yes.

Q: There are many ways to synchronize PostgreSQL to the Hudi data lake through CDC. Does Apache SeaTunnel support it?
A: There is PR about PG CDC, which is pending merge, https://github.com/apache/incubator-seatunnel/pull/3867/files.

Q: Use the HTTP source to request API data, but you need to request the token first, and then pass the token to the next HTTP source as a parameter. How to achieve this?
A:
1. Write shell request token
2. As a pre-dependency of Apache SeaTunnel
3. Pass the token to the Apache SeaTunnel job.

📌📌Welcome to fill out this survey to give your feedback on your user experience or just your ideas about Apache SeaTunnel:)

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

Massive data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of Apache SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Come and join us!

[Weekly FAQ] Issue 1 | Answer your questions about Apache SeaTunnel(Incubating)

Written by Apache SeaTunnel

No responses yet