Comprehensive Guide to Installing and Configuring SeaTunnel and SeaTunnel-Web on CentOS 7.x
I. Environment Setup
For this setup, I used a virtual machine running CentOS 7.x, with Java 15 and MySQL 8.0.28 installed. These initial steps, being foundational, are skipped here as they are straightforward and have been covered in previous articles. The environment is configured on a single CentOS 7.x virtual machine instance, requiring the opening of ports 8081, 3306, and 5801 in the firewall to ensure network accessibility.
II. Installing and Deploying SeaTunnel
Downloading the Installation Package Begin by setting the version and downloading the SeaTunnel package using wget. Extract the package using tar.
export version="2.3.3"
wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz"
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"
Setting Environment Variables Add SeaTunnel’s directory to your path for easy access.
vi /etc/profile.d/seatunnel.sh
# Add the following variables
export SEATUNNEL_HOME=/root/apache-seatunnel-2.3.3 #What is set here is the decompression directory of seatunnel.
export PATH=$PATH:$SEATUNNEL_HOME/bin
Installing Connector Plugins Navigate to the /root/apache-seatunnel-2.3.3
directory and execute the plugin installation script.
sh bin/install-plugin.sh 2.3.3
You may customize the plugins you need by modifying the plugin-mapping.properties
file before running the installation command. By default, all connectors are installed, which may take some time depending on your internet speed.
Copying the JAR File to the lib Directory
Starting SeaTunnel Use the following commands to start SeaTunnel within the /root/apache-seatunnel-2.3.3
directory:
sh bin/seatunnel-cluster.sh -d -DJvmOption="-Xms1G -Xmx1G"
or
nohup sh bin/seatunnel-cluster.sh 2>&1 &
Check the process using jps
, and ensure there are no errors in the logs under the logs
directory.
Executing the Official Client Submission Demo Run the official demo command as provided on the website. You should see output indicating successful execution without errors, signifying that SeaTunnel has started correctly.
III. Execute official client submission task demo
Enter the /root/apache-seatunnel-2.3.3 path and execute the startup command:
$SEATUNNEL_HOME/bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template
This command comes from the official website, and the execution results are as follows:
[root@es1 apache-seatunnel-2.3.3]# $SEATUNNEL_HOME/bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
December 19, 2023 12:11:20 上午 com.hazelcast.internal.config.AbstractConfigLocator
message: Loading configuration '/root/apache-seatunnel-2.3.3/config/seatunnel.yaml' from System property 'seatunnel.config'
December 19, 2023 12:11:20 上午 com.hazelcast.internal.config.AbstractConfigLocator
message: Using configuration file at /root/apache-seatunnel-2.3.3/config/seatunnel.yaml
December 19, 2023 12:11:20 上午 org.apache.seatunnel.engine.common.config.SeaTunnelConfig
message: seatunnel.home is /root/apache-seatunnel-2.3.3
December 19, 2023 12:11:20 上午 com.hazelcast.internal.config.AbstractConfigLocator
message: Loading configuration '/root/apache-seatunnel-2.3.3/config/hazelcast.yaml' from System property 'hazelcast.config'
December 19, 2023 12:11:20 上午 com.hazelcast.internal.config.AbstractConfigLocator
message: Using configuration file at /root/apache-seatunnel-2.3.3/config/hazelcast.yaml
December 19, 2023 12:11:20 上午 com.hazelcast.internal.config.AbstractConfigLocator
message: Loading configuration '/root/apache-seatunnel-2.3.3/config/hazelcast-client.yaml' from System property 'hazelcast.client.config'
December 19, 2023 12:11:20 上午 com.hazelcast.internal.config.AbstractConfigLocator
message: Using configuration file at /root/apache-seatunnel-2.3.3/config/hazelcast-client.yaml
2023-12-19 00:11:21,149 INFO com.hazelcast.client.impl.spi.ClientInvocationService - hz.client_1 [seatunnel] [5.1] Running with 2 response threads, dynamic=true
2023-12-19 00:11:21,233 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is STARTING
2023-12-19 00:11:21,234 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is STARTED
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.hazelcast.internal.networking.nio.SelectorOptimizer (file:/root/apache-seatunnel-2.3.3/starter/seatunnel-starter.jar) to field sun.nio.ch.SelectorImpl.selectedKeys
WARNING: Please consider reporting this to the maintainers of com.hazelcast.internal.networking.nio.SelectorOptimizer
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2023-12-19 00:11:21,294 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Trying to connect to cluster: seatunnel
2023-12-19 00:11:21,298 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Trying to connect to [localhost]:5801
2023-12-19 00:11:21,352 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_CONNECTED
2023-12-19 00:11:21,352 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Authenticated with server [localhost]:5801:772efc0a-4c18-4a4b-baa7-b82b9ae4a395, server version: 5.1, local address: /127.0.0.1:36095
2023-12-19 00:11:21,356 INFO com.hazelcast.internal.diagnostics.Diagnostics - hz.client_1 [seatunnel] [5.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2023-12-19 00:11:21,384 INFO com.hazelcast.client.impl.spi.ClientClusterService - hz.client_1 [seatunnel] [5.1]
Members [1] {
Member [localhost]:5801 - 772efc0a-4c18-4a4b-baa7-b82b9ae4a395
}
2023-12-19 00:11:21,421 INFO com.hazelcast.client.impl.statistics.ClientStatisticsService - Client statistics is enabled with period 5 seconds.
2023-12-19 00:11:21,706 INFO org.apache.seatunnel.engine.client.job.JobExecutionEnvironment - add common jar in plugins :[]
2023-12-19 00:11:21,733 INFO org.apache.seatunnel.core.starter.utils.ConfigBuilder - Loading config file from path: /root/apache-seatunnel-2.3.3/config/v2.batch.config.template
2023-12-19 00:11:21,799 INFO org.apache.seatunnel.core.starter.utils.ConfigShadeUtils - Load config shade spi: [base64]
2023-12-19 00:11:21,848 INFO org.apache.seatunnel.core.starter.utils.ConfigBuilder - Parsed config file: {
"env" : {
"execution.parallelism" : 2,
"job.mode" : "BATCH",
"checkpoint.interval" : 10000
},
"source" : [
{
"schema" : {
"fields" : {
"name" : "string",
"age" : "int"
}
},
"row.num" : 16,
"parallelism" : 2,
"result_table_name" : "fake",
"plugin_name" : "FakeSource"
}
],
"sink" : [
{
"plugin_name" : "Console"
}
]
}
2023-12-19 00:11:21,885 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:21,886 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:21,895 INFO org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery - Load SeaTunnelSink Plugin from /root/apache-seatunnel-2.3.3/connectors/seatunnel
2023-12-19 00:11:21,911 INFO org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery - Discovery plugin jar: FakeSource at: file:/root/apache-seatunnel-2.3.3/connectors/seatunnel/connector-fake-2.3.3.jar
2023-12-19 00:11:21,912 INFO org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery - Discovery plugin jar: Console at: file:/root/apache-seatunnel-2.3.3/connectors/seatunnel/connector-console-2.3.3.jar
2023-12-19 00:11:21,915 INFO org.apache.seatunnel.engine.core.parse.ConfigParserUtil - Currently, incorrect configuration of source_table_name and result_table_name options don't affect job running. In the future we will ban incorrect configurations.
2023-12-19 00:11:21,915 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:21,915 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:21,916 WARN org.apache.seatunnel.engine.core.parse.ConfigParserUtil - This configuration is not recommended. A source/transform(FakeSource) is configured with 'result_table_name' option value of 'fake', but subsequent transform/sink(Console) is not configured with 'source_table_name' option.
2023-12-19 00:11:21,919 INFO org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser - start generating all sources.
2023-12-19 00:11:21,919 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:21,953 INFO org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery - Load SeaTunnelSource Plugin from /root/apache-seatunnel-2.3.3/connectors/seatunnel
2023-12-19 00:11:21,970 INFO org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery - Discovery plugin jar: FakeSource at: file:/root/apache-seatunnel-2.3.3/connectors/seatunnel/connector-fake-2.3.3.jar
2023-12-19 00:11:21,974 INFO org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery - Load plugin: PluginIdentifier{engineType='seatunnel', pluginType='source', pluginName='FakeSource'} from classpath
2023-12-19 00:11:22,003 INFO org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser - start generating all transforms.
2023-12-19 00:11:22,003 INFO org.apache.seatunnel.engine.core.parse.MultipleTableJobConfigParser - start generating all sinks.
2023-12-19 00:11:22,004 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:22,011 INFO org.apache.seatunnel.api.configuration.ReadonlyConfig - Config uses fallback configuration key 'plugin_name' instead of key 'factory'
2023-12-19 00:11:22,090 INFO org.apache.seatunnel.engine.client.job.ClientJobProxy - Start submit job, job id: 789162834679300097, with plugin jar [file:/root/apache-seatunnel-2.3.3/connectors/seatunnel/connector-fake-2.3.3.jar, file:/root/apache-seatunnel-2.3.3/connectors/seatunnel/connector-console-2.3.3.jar]
2023-12-19 00:11:22,893 INFO org.apache.seatunnel.engine.client.job.ClientJobProxy - Submit job finished, job id: 789162834679300097, job name: SeaTunnel
2023-12-19 00:11:22,956 WARN org.apache.seatunnel.engine.client.job.JobMetricsRunner - Failed to get job metrics summary, it maybe first-run
2023-12-19 00:11:24,370 INFO org.apache.seatunnel.engine.client.job.ClientJobProxy - Job (789162834679300097) end with state FINISHED
2023-12-19 00:11:24,416 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand -
***********************************************
Job Statistic Information
***********************************************
Start Time : 2023-12-19 00:11:21
End Time : 2023-12-19 00:11:24
Total Time(s) : 2
Total Read Count : 32
Total Write Count : 32
Total Failed Count : 0
***********************************************
2023-12-19 00:11:24,416 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN
2023-12-19 00:11:24,422 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: [localhost]:5801:772efc0a-4c18-4a4b-baa7-b82b9ae4a395, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:36095->localhost/127.0.0.1:5801}, remoteAddress=[localhost]:5801, lastReadTime=2023-12-19 00:11:24.411, lastWriteTime=2023-12-19 00:11:24.371, closedTime=2023-12-19 00:11:24.420, connected server version=5.1}
2023-12-19 00:11:24,422 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED
2023-12-19 00:11:24,431 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN
2023-12-19 00:11:24,433 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - Closed SeaTunnel client......
2023-12-19 00:11:24,433 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - Closed metrics executor service ......
2023-12-19 00:11:24,438 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - run shutdown hook because get close signal
Download the installation package
The installation package is at the following address:
https://seatunnel.apache.org/download
Unzip:
tar -zxvf apache-seatunnel-web-bin-${project.version}.tar.gz
The unzipped directory is as follows:
3.2.1 Manual Initialization
Before proceeding, manually execute the script and then update the database connection information in the application.yml
file.
3.2.2 Using Script for Database Initialization
Firstly, set the following environment variables:
export HOSTNAME="localhost"
export PORT="3306"
export USERNAME="root"
export PASSWORD="123456"
Then, execute:
sh apache-seatunnel-web-bin-2.3.3/script/init_sql.sh
Database Initialization Script or Configuring application.yml
Database Connection Information
3.2.1 Manual Initialization
Before proceeding, manually execute the script and then update the database connection information in the application.yml
file.
3.2.2 Using Script for Database Initialization
Firstly, set the following environment variables:
export HOSTNAME="localhost"
export PORT="3306"
export USERNAME="root"
export PASSWORD="123456"
Then, execute:
sh apache-seatunnel-web-bin-2.3.3/script/init_sql.sh
If there are conflicts with environment variable names, consider renaming them in init_sql.sh
by adding a prefix like STWEB_
. This allows you to execute the initialization command seamlessly.
3.3 Modifying Port and Data Source
Edit the conf/application.yml
file to update the port number and data source information.
3.4 Copying Configuration Files
You’ll need to copy apache-seatunnel-2.3.3/config/hazelcast-client
and apache-seatunnel-2.3.3/connectors/plugin-mapping.properties
files to the apache-seatunnel-web-bin-2.3.3/conf
directory.
3.5 Copying JAR Files to lib Directory
3.6 Launching the Application
Run the following command to start the application:
sh bin/seatunnel-backend-daemon.sh start
Check the Java processes with jps
as shown below:
A common pitfall is executing the command within the bin
directory, which can lead to a 404 error when accessing the homepage.
sh seatunnel-backend-daemon.sh start
If you encounter a 404 error when trying to access the homepage, it might look like this:
3.7 Accessing the Homepage
Access the homepage via ip:8081/ui
, which is the port configured in conf/application.yml
.
http://192.168.1.4:8081/
If you’re unable to log in, it might be due to MySQL not running. Use the following commands to manage the MySQL service:
service mysqld start # Start the MySQL service
service mysqld status # Check the status of the MySQL service
service mysqld stop # Stop the MySQL service
service mysqld restart # Restart the MySQL service
systemctl enable mysqld.service # Set MySQL service to start on boot
systemctl is-enabled mysqld.service # Confirm MySQL service is set to start on boot
3.8 Executing MySQL-JDBC to MySQL-JDBC Single Table Data Synchronization
The execution is successful, but on my CentOS 7.x virtual machine, I did not have the Hadoop 3.1.3 environment installed. Despite this, the logs showed no errors, indicating the non-essential nature of the Hadoop environment as mentioned by the official documentation. However, for those compiling and building locally without Hadoop, installation errors may occur, so installing Hadoop is recommended to avoid such issues.
Conclusion
This guide aimed to simplify the installation and configuration of SeaTunnel and SeaTunnel-Web on a CentOS 7.x environment, addressing potential pitfalls along the way. I hope this article helps streamline your setup process and encourages a smoother operation of your data integration tasks. If you found this guide helpful, don’t forget to like, share, and follow for more insights. Happy data processing!