How many ways are there to contribute to the Apache SeaTunnel project?

Apache SeaTunnel
8 min readJul 31, 2024

--

There are several common ways to contribute to open-source projects:

  1. Participate in Discussions
    Help users in the community who encounter difficulties by explaining how to use the framework. This is also a form of contribution.
  2. Documentation Contributions
    Help improve the framework’s documentation, such as translating English documents into Chinese or correcting errors in the documentation. This is often the first step for many people to contribute to open source.
  3. Code Contributions
    After reviewing the source code, if you find a bug, you can fix it and submit the code to the community. Alternatively, if there is a new feature that needs development, providing a solution for this new feature is also considered code contribution and is a significant way to participate in open source.
  4. Participate in Meetups
    Engage in community-organized online or offline meetups as a speaker or volunteer. Sharing Apache SeaTunnel practices, market insights, and operational activities are all ways to contribute to the community.
  5. Submit Articles
    Write about topics related to Apache SeaTunnel, whether it’s technical deployment documentation, practical experiences, or summaries of open-source contributions. Submitting these articles to the community (contact assistant via WeChat at 17743592110) is considered a non-code contribution.

Common Roles in Open Source Communities

  1. Contributor
    Anyone who has contributed at least once is considered a contributor.
  2. Committer
    After being a contributor, if you continue to contribute consistently and have solid technical skills, you might be voted in by the PMC (Project Management Committee) to become a committer. The difference between a committer and a contributor is that a committer has write access to the project’s repository. They can review and merge code from contributors. Committers also receive an @apache.org email address.
  3. PMC Member
    Outstanding committers can become PMC Members. PMC Members are responsible for the overall direction of the project and making important decisions, requiring a forward-looking technical vision.

How to Fix Bugs

Fixing bugs is an important way to contribute to a project. Here’s an example of how to fix a bug in the Apache SeaTunnel project:

Reproducing the Problem

  1. Scenario
    When inserting data into Doris, a ClassCastException occurs, indicating a typecast error. The error message states that Java.Util.ArrayList cannot be cast to java.lang.CharSequence. After confirming that the configuration file is correct, we carefully reviewed the stack trace information printed in the console.

2. Locate the Problem
Based on the stack trace, we identified that the error is in the DorisOutPutFormat.java file, line 210. We need to open the source code in an IDE to examine the code at that location.

3. Analyze the Problem
At line 210, we see the following issue:

The code attempts to cast a batch (an ArrayList) to CharSequence, which is incorrect.

To fix this, we need to understand the intent of this code. SeaTunnel’s DorisSink relies on Doris’s stream load method, which imports data through HTTP requests. The batch is used to accumulate data, which must be serialized for HTTP transmission. Thus, the code’s purpose is to convert the batch’s data into a string format according to certain rules.

4. Determine the Solution
We need to check the String class's join static method requirements. We find that the joinmethod's second parameter is a varargs of type CharSequence, meaning we can pass an array of CharSequence. The code should be modified as follows:

5. Verify the Solution

  • Rebuild: Recompile the package and deploy it to our cluster. Run the task again to see if it works. Due to cross-platform issues (e.g., path differences between Windows and Linux), some unit tests may fail. We can bypass unit tests and code style checks using the following command:
mvn clean package -D maven.test.skip=true -D checkstyle.skip=true
  • Use the New Package: Execute the data import command with the newly compiled SeaTunnel package.
  • Check Doris: Verify if the data was successfully imported into Doris without type conversion errors.

6. Summary
After these steps, we confirmed that the issue was with the source code. We will now report the bug to the community and provide our solution.

How to Create an Issue

What is an Issue

Each GitHub repository has its issue tracker where you can report problems or discuss new features. It’s also the place to report bugs.

Open-source communities usually require you to create an issue before submitting a code merge request. This practice ensures that changes are documented and traceable.

Creating an Issue

  1. Click the “New Issue” button to go to the next page.

2. Choose the type of issue you want to create (e.g., “bug report”).

3. Fill out the form according to the prompts. Note that the form advises checking for existing issues before creating a new one.

4. After completing the form, click “Submit new issue” to create the issue.

5. Review the issue you created.

How to Create a Pull Request

A pull request (PR) is a request to merge your code changes into the main project. Therefore, you should have your code ready before creating a pull request.

Creating a Pull Request

  1. Fork the Project
    For first-time contributors, fork the official repository by clicking the “Fork” button. This will create a copy of the repository under your GitHub account.

2. Click the Fork Button

After clicking the fork button, an identical repository will appear in your GitHub account, as shown below.

2. Clone Your Forked Repository
Use the URL of your forked repository to clone it to your local machine using:

git clone <repository-url>

How to Modify the Code

  1. Right-click in the root directory of the project and open the cloned project with your IDE (e.g., IntelliJ IDEA).
  2. Modify the code at the location we previously identified.

3. Commit Changes (Ideally, create a branch from dev, then commit on the new branch. This is a counterexample.)

4. Push to Your Forked Repository Push to your forked repository, specifying a new branch name in the remote target branch.

How to Create a Pull Request

  1. Create Pull Request on GitHub Go to your GitHub repository, you will see a prompt suggesting you can create a pull request (PR). Click this button to enter the next page.
  2. Fill Out PR Template On the new page, follow the dialog box template to explain the purpose of your PR. Don’t forget to link it to the issue you created earlier by pasting the issue link.
  3. Submit PR Once everything is filled out, click the Create Pull Request button to create a PR.

4. Review Changes GitHub will show the differences in your changes. Red indicates deleted code, green indicates added code. Even if you change one letter, it will show as a line removed and a line added.

5. CI/CD Checks After submitting the PR, GitHub will initiate an automatic check. This process is called CI/CD (Continuous Integration/Continuous Deployment). Essentially, your code will be automatically pulled, compiled, run through unit tests, format checks, and other inspections. Only if all tests pass will your code be considered for merging.

6. Wait for Review After submission, you can take a break. The automated tests take a while, and you need to wait for community members to notice your pull request.

Becoming a Source Code Contributor

After some time, you can check back to see if your PR has been reviewed and merged by an Apache member. Once merged, your contributions will be part of the SeaTunnel project, and you will be recognized as a contributor.

Finding An Opportunity to Contribute

ASF open-source projects often maintain a to-do list of tasks that are suitable for newcomers. Look for these lists to find opportunities to start contributing.

About Apache SeaTunnel

Apache SeaTunnel is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day stably and efficiently.

Welcome to fill out this form to be a speaker of Apache SeaTunnel: https://forms.gle/vtpQS6ZuxqXMt6DT6 :)

Why do we need Apache SeaTunnel?

Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.

  • Data loss and duplication
  • Task buildup and latency
  • Low throughput
  • Long application-to-production cycle time
  • Lack of application status monitoring

Apache SeaTunnel Usage Scenarios

  • Massive data synchronization
  • Massive data integration
  • ETL of large volumes of data
  • Massive data aggregation
  • Multi-source data processing

Features of Apache SeaTunnel

  • Rich components
  • High scalability
  • Easy to use
  • Mature and stable

How to get started with Apache SeaTunnel quickly?

Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.

https://seatunnel.apache.org/docs/2.1.0/developement/setup

How can I contribute?

We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!

Submit an issue:

https://github.com/apache/seatunnel/issues

Contribute code to:

https://github.com/apache/seatunnel/pulls

Subscribe to the community development mailing list :

dev-subscribe@seatunnel.apache.org

Development Mailing List :

dev@seatunnel.apache.org

Join Slack:

https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kcxzyrxz-lKcF3BAyzHEmpcc4OSaCjQ

Follow Twitter:

https://twitter.com/ASFSeaTunnel

--

--

Apache SeaTunnel
Apache SeaTunnel

Written by Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.

No responses yet