Welcome, New Committer!

5 min readAug 31, 2022

Open source never stops, and small contributions will converge into a huge driving force for the development of the Apache SeaTunnel community. In the “Conversation With Contributors” column today, we talk with one of our new committer, Tian Chao.

Personal profile

Tian Chao, Senior Big Data Development Engineer & Apache SeaTunnel Committer

GitHub ID: tyrantlucifer

Research field: Data integration and data development research, with a deep understanding of the core principles of common data integration frameworks and distributed data engines on the market, such as DataX, Apache SeaTunnel, Spark, Flink, etc.

Community contribution

01 Code contribution

Translated spark-base-api from the scala version to the java version in Version 2.1.0
Added parquet and orc file support for the new version file connector in Version 2.1.3
Added parquet and orc file support for the new version of the hive connector in Version 2.1.3
Developed a new version of HTTP sink connector sink in Version 2.1.3
Developed a new version of the feishu sink connector in Version 2.1.3
Fixed bugs in e2e testing, such as the Flink docker image having no Hadoop dependency, some package dependency conflicts, etc.
Resolved a jar package dependency conflict bug in Apache SeaTunnel-spark-connector-v2-examples
Added full e2e testing for the new version of the file connector
Added full e2e testing for the new version HTTP connector
Added public ability to parse user-defined schemas for the new version of source connectors
Developed custom classloader for Apache SeaTunnel-engine
Maintained official website documentation and add corresponding description information for all plugins

02 Non-code contributions

Participate in online meetings organized by the Apache SeaTunnel community to promote community exchanges and Apache SeaTunnel project dissemination.

Know Apache SeaTunnel for the first time

My company developed its low-code data development engine at the time, but the effect was not very satisfactory. A real-time stream processing computing engine was built based on Akka-stream, but most of the business tasks of the company are offline, so there is a lack of efficiency in data processing by using this engine.

To solve this problem, I researched the data integration framework that is relatively easy to use on the market. Then I found Apache SeaTunnel, which was called Waterdrop at that time.

Research and comparison

In my research, I found some advantages of Apache SeaTunnel in comparison with other competing products:

Easy to use, flexible to configure, no development required;
Modularity and plug-in;
Support data processing and aggregation using SQL;
Due to its highly encapsulated computing engine architecture, it can be well integrated with the middle platform and provide distributed computing capabilities to the outside world.

Of course, there are also some shortcomings of Apache SeaTunnel, such as the upward compatibility of the current version with the Flink version upgrade. Although Spark jobs can be configured quickly, the operation personnel still need to know some parameter tuning knowledge to make the job more efficient.

Competing products comparison (information may be outdated)

FlinkX, now renamed chunjun
StreamX
DataX

At that time, our company internally developed a data synchronization system based on the 1.5.x versions of Apache SeaTunnel and DataX. The main scenario was data synchronization. We have developed our task parsing layer on top of the computing engine layer, which is to generate configuration files that Datax or Apache SeaTunnel can recognize based on the internal task format, and then schedule these DataX or Apache SeaTunnel tasks.

Open source is great to me

It has been 5 years since I engaged in open-source during my college life. I started by visiting some hot open source communities without contributing or submitting any code. When I was more proficient in git skills, I started to put some of my code on GitHub, which seldom attracts attention.

After working, I started to think from the perspective of the product, and I can better understand what kind of functions people need. I open-sourced a tool and gained 600+ stars on GitHub. This is the first achievement I have made in the open source world.

Later, I came into contact with DataX and Apache SeaTunnel, read their code, and began to follow the community, and since then I have also started my way of contributing.

First impression of the community

There are a lot of talents in the community who are very active. They are very enthusiastic to review my code and give me valuable suggestions.

Participating in the Apache SeaTunnel community makes me feel the Apache Way, and learn a lot from the design ideas of many excellent projects, as well as the management process of Apache projects.

Committer Testimonials

I hope to learn more from the community, such as architecture design, project process, etc., to improve my skills. It’s been a great honor for my code to be recognized by the community, and that’s enough.
Finally, Apache SeaTunnel is currently developing its new connector and new computing engine. I think that will promote Apache SeaTunnel to become a top data integration project.

About Apache SeaTunnel

Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

Data loss and duplication
Task buildup and latency
Low throughput
Long application-to-production cycle time
Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

Massive data synchronization
Massive data integration
ETL of large volumes of data
Massive data aggregation
Multi-source data processing

Features of Apache SeaTunnel

Rich components
High scalability
Easy to use
Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/incubator-seatunnel/issues

Contribute code to:

https://github.com/apache/incubator-seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

Come and join us!