Talk to the Apache SeaTunnel(Incubating) SQL Transform function contributor

4 min readApr 10, 2023

The “Dialogue With the Apache SeaTunnel Community” series of columns is launched today to regularly discover contributors who have made outstanding contributions to the community, and tell their stories and contribution experiences with the Apache SeaTunnel project to the community. Let’s learn from each other, and share your experiences in the open-source world.
In this issue, we found a developer who participated in the release of the latest version of Apache SeaTunnel and contributed to the important function of the project, SQL Transform. Let’s explore his story with the community through a simple conversation.

Personal Portrait

Name: Ma Chengyuan
Company: Hang Seng Electronics
GitHub ID: rewerma
Personally good at Java middleware, microservices, big data, etc.

Q: How long have you been involved in open source? Why does open source appeal to you?

A: I have been involved in open source for about 7 years, and I feel a certain sense of accomplishment seeing my RP being recognized and used by many developers.

Q: What contributions have you made to the community? Can you describe the specific plan?

A: Submitted the PR of the SQL Transform plugin for Apache SeaTunnel, generated the physical execution plan through the SQL parser, and executed the data conversion logic with the self-built function library. SQL Transform is an API that does not depend on a task-specific execution engine and can run perfectly on three different engines: Flink/Spark/Zeta.

Q: Have you done data integration system research before? Have you done a comparative analysis between Apache SeaTunnel and other competitor products?

A: I have a deep understanding of Canal, DataX, and other components, and I’m also Canal’s contributor.

Q: Has Apache SeaTunnel been used in your company? What is the usage scenario?

A: The company is currently preparing to introduce Apache SeaTunnel to replace DataX, mainly for data collection and conversion scenarios. SeaTunnel can mainly solve the problems of DataX’s single process, inability to flexibly expand Transform, and inability to directly connect to real-time synchronization, and the company has related needs in both collection and transformation scenarios.

Q: Have you done re-development based on Apache SeaTunnel? Can you introduce the related development plan?

A: At present, I have plans to carry out the re-development on Apache SeaTunnel, including removing some connector plug-ins, adapting the connectors of internal related data sources, and expanding the capabilities of Transform.

Q: What do you feel about your first contribution to the SeaTunnel community? What do you hope to gain here?

A: The community is relatively active, and you can often see many good ideas and PRs.

Q: What do you consider to be the most critical requirements for a data integration system? Can Apache SeaTunnel meet these key needs? And what new optimizations and improvements do you expect Apache SeaTunnel to make in the future?

A: It is hoped that Apache SeaTunnel will have a greater improvement in data collection performance; Transform’s computing power is also expected to be expanded.

Q: What kind of support do you hope to get when you participate in the Apache SeaTunnel community for your personal growth?

A: I hope to learn more about new technologies here.

📌📌Welcome to fill out this survey to give your feedback on your user experience or just your ideas about Apache SeaTunnel:)

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

Massive data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of Apache SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel