Welcome, New Committer!
Open source never stops, and small contributions will converge into a huge driving force for the development of the Apache SeaTunnel community. In the “Conversation With Contributors” column today, we talk with one of our new committer, Tian Chao.
Personal profile
GitHub ID: tyrantlucifer
Research field: Data integration and data development research, with a deep understanding of the core principles of common data integration frameworks and distributed data engines on the market, such as DataX, Apache SeaTunnel, Spark, Flink, etc.
Community contribution
01 Code contribution
- Translated spark-base-api from the scala version to the java version in Version 2.1.0
- Added parquet and orc file support for the new version file connector in Version 2.1.3
- Added parquet and orc file support for the new version of the hive connector in Version 2.1.3
- Developed a new version of HTTP sink connector sink in Version 2.1.3
- Developed a new version of the feishu sink connector in Version 2.1.3
- Fixed bugs in e2e testing, such as the Flink docker image having no Hadoop dependency, some package dependency conflicts, etc.
- Resolved a jar package dependency conflict bug in Apache SeaTunnel-spark-connector-v2-examples
- Added full e2e testing for the new version of the file connector
- Added full e2e testing for the new version HTTP connector
- Added public ability to parse user-defined schemas for the new version of source connectors
- Developed custom classloader for Apache SeaTunnel-engine
- Maintained official website documentation and add corresponding description information for all plugins
02 Non-code contributions
Participate in online meetings organized by the Apache SeaTunnel community to promote community exchanges and Apache SeaTunnel project dissemination.
Know Apache SeaTunnel for the first time
My company developed its low-code data development engine at the time, but the effect was not very satisfactory. A real-time stream processing computing engine was built based on Akka-stream, but most of the business tasks of the company are offline, so there is a lack of efficiency in data processing by using this engine.
To solve this problem, I researched the data integration framework that is relatively easy to use on the market. Then I found Apache SeaTunnel, which was called Waterdrop at that time.
Research and comparison
In my research, I found some advantages of Apache SeaTunnel in comparison with other competing products:
- Easy to use, flexible to configure, no development required;
- Modularity and plug-in;
- Support data processing and aggregation using SQL;
- Due to its highly encapsulated computing engine architecture, it can be well integrated with the middle platform and provide distributed computing capabilities to the outside world.
Of course, there are also some shortcomings of Apache SeaTunnel, such as the upward compatibility of the current version with the Flink version upgrade. Although Spark jobs can be configured quickly, the operation personnel still need to know some parameter tuning knowledge to make the job more efficient.
Competing products comparison (information may be outdated)
- FlinkX, now renamed chunjun
- StreamX
- DataX
At that time, our company internally developed a data synchronization system based on the 1.5.x versions of Apache SeaTunnel and DataX. The main scenario was data synchronization. We have developed our task parsing layer on top of the computing engine layer, which is to generate configuration files that Datax or Apache SeaTunnel can recognize based on the internal task format, and then schedule these DataX or Apache SeaTunnel tasks.
Open source is great to me
It has been 5 years since I engaged in open-source during my college life. I started by visiting some hot open source communities without contributing or submitting any code. When I was more proficient in git skills, I started to put some of my code on GitHub, which seldom attracts attention.
After working, I started to think from the perspective of the product, and I can better understand what kind of functions people need. I open-sourced a tool and gained 600+ stars on GitHub. This is the first achievement I have made in the open source world.
Later, I came into contact with DataX and Apache SeaTunnel, read their code, and began to follow the community, and since then I have also started my way of contributing.
First impression of the community
There are a lot of talents in the community who are very active. They are very enthusiastic to review my code and give me valuable suggestions.
Participating in the Apache SeaTunnel community makes me feel the Apache Way, and learn a lot from the design ideas of many excellent projects, as well as the management process of Apache projects.
Committer Testimonials
I hope to learn more from the community, such as architecture design, project process, etc., to improve my skills. It’s been a great honor for my code to be recognized by the community, and that’s enough.
Finally, Apache SeaTunnel is currently developing its new connector and new computing engine. I think that will promote Apache SeaTunnel to become a top data integration project.
About Apache SeaTunnel
Apache SeaTunnel (formerly Waterdrop) is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day in a stable and efficient manner.
Why do we need Apache SeaTunnel?
Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.
- Data loss and duplication
- Task buildup and latency
- Low throughput
- Long application-to-production cycle time
- Lack of application status monitoring
Apache SeaTunnel Usage Scenarios
- Massive data synchronization
- Massive data integration
- ETL of large volumes of data
- Massive data aggregation
- Multi-source data processing
Features of Apache SeaTunnel
- Rich components
- High scalability
- Easy to use
- Mature and stable
How to get started with Apache SeaTunnel quickly?
Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.
https://seatunnel.apache.org/docs/2.1.0/developement/setup
How can I contribute?
We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!
Submit an issue:
https://github.com/apache/incubator-seatunnel/issues
Contribute code to:
https://github.com/apache/incubator-seatunnel/pulls
Subscribe to the community development mailing list :
dev-subscribe@seatunnel.apache.org
Development Mailing List :
dev@seatunnel.apache.org
Join Slack:
https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ
Follow Twitter:
https://twitter.com/ASFSeaTunnel
Come and join us!