[Weekly FAQ] Issue 1 | Answer your questions about Apache SeaTunnel(Incubating)

Apache SeaTunnel
4 min readApr 10, 2023

--

To make the users and enthusiasts of Apache SeaTunnel get timely and quick answers to their questions about the project, the community specially launched this [Weekly FAQ] column, hoping to solve your problems in practice.

Q: Does Apache SeaTunnel require the Spark version to be at least 2.4?
A: Apache SeaTunnel 2.1.x can support Spark2.3.

Q: Does Apache SeaTunnel need to add any additional dependent jar packages when operating Hive data?
A: https://seatunnel.apache.org/docs/2.3.1/connector
It is stated in v2/source/Hive#description that if the Spark/Flink engine is used, Spark/Flink must be integrated with Hive.

If you use the Apache SeaTunnel Zeta engine, you need to put seatunnel-hadoop3–3.1.4-uber.jar and hive-exec-2.3.9.jar in the $SEATUNNEL_HOME/lib/ directory.

Q: Does Apache SeaTunnel support the collection of pictures, audio, or video?
A: Not supported yet

Q: Under DolphinSchedulercdc mode, data under 20,000 entries can not be written, is this true?
A: You can upgrade DolphinScheduler to solve the problem, becausethe source_table_name is not set in the sink.

Q: Can DolphinScheduler only integrate Apache SeaTunnel+2.1.X version? Is there any official documentation or tutorial?
A: The community has already provided a PR to support the latest Apache SeaTunnel version, waiting to be merged.

Q: How does Apache SeaTunnel set the memory size occupied?
A: There are jvm_options files and jvm_client_options files in the $SeaTunnel_HOME/config directory. The jvm_options file controls the memory of the process in Zeta Cluster mode. jvm_client_options controls the memory of the client that submits the job and the job in local mode.

Q: How to write the code to use Apache SeaTunnel to complete data synchronization from HDFS to MySQL?
A: Refer to the official document HDFS Source and JDBC Sink connector: https://seatunnel.apache.org/docs/2.3.1/about.

Q: After executing sh install-pulgins, I run the demo provided by the official, but still report an error that the plugin cannot be found, what’s wrong with that?
A: Check whether there is a corresponding plug-in jar package in the $SeaTunnel_HOME/connectors/SeaTunnel directory

Q: Clean up the data before the sink is similar to making a truncate table, and how to configure the sink?

A: This is not supported yet, and we need to wait for the community SaveMode feature to be completed.

Q: When Apache SeaTunnel uses the zeta engine -i to pass parameters, what should be written in the configuration file?
A: Now Zeta does not support -i for the time being.

Q: Does Apache SeaTunnel support Doris to MySQL?
A: Yes.

Q: There are many ways to synchronize PostgreSQL to the Hudi data lake through CDC. Does Apache SeaTunnel support it?
A: There is PR about PG CDC, which is pending merge, https://github.com/apache/incubator-seatunnel/pull/3867/files.

Q: Use the HTTP source to request API data, but you need to request the token first, and then pass the token to the next HTTP source as a parameter. How to achieve this?
A:
1. Write shell request token
2. As a pre-dependency of Apache SeaTunnel
3. Pass the token to the Apache SeaTunnel job.

📌📌Welcome to fill out this survey to give your feedback on your user experience or just your ideas about Apache SeaTunnel:)

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

  • Data loss and duplication
  • Task buildup and latency
  • Low throughput
  • Long application-to-production cycle time
  • Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

  • Massive data synchronization
  • Massive data integration
  • ETL of large volumes of data
  • Massive data aggregation
  • Multi-source data processing

Features of Apache SeaTunnel

  • Rich components
  • High scalability
  • Easy to use
  • Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Come and join us!

--

--

Apache SeaTunnel
Apache SeaTunnel

Written by Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.

No responses yet