18 lines of code to access Apache SeaTunnel(Incubating) Connector, taking OpenMLDB as an example

Apache SeaTunnel
3 min readFeb 23, 2023

--

Let me take the development of OpenMLDB Source as an example to explain the process of supporting the Apache SeaTunnel Connector. I only developed Source for the time being, so I will talk about how to develop OpenMLDB Source Connector today.

The classes to be used

The implementation of Source is very simple, involving a total of 7 classes, and we will get through the whole process.

OpenMLDB’s Java SDK is powerful, by which we can quickly access Apache SeaTunnel, execute user SQL statements to obtain data and obtain Schema through pre-execution of SQL statements so that it can be quickly converted to SeaTunnelRow in Apache SeaTunnel, this is a nice feature.

Therefore, with the help of good functions, we can quickly realize Source access.

The configuration parameters of the entire connector are included under the Config information, and these parameters can be encapsulated into a Parameters class for better delivery.

In the OpenMldbSqlExecutor class, because the official OpenMLDB Java SDK recommends the global Executor, I package all resources and unify exceptions in it.

For Source, I only implemented two classes, the OpenMldbSource base class, and OpenMldbSourceReader.

OpenMldbSourceFactory is a tool class to instantiate the Source plug-in. If you are interested, you can take a look at an example of the entire code.

At present, OpenMLDB does not support shard reading. If you are interested, you can contribute the feature of shard sampling and better integrate it with Apache SeaTunnel.

As for now, Apache SeaTunnel has fully supported all data types of OpenMLDB. You can download it to see the detailed implementation.

OpenMLDB Source configuration

Then let’s see the configurations of the entire source. We currently support certain modes. OpenMLDB supports both a stand-alone version and a cluster. If you support a cluster, you will inevitably need to use some middleware to manage some clusters. status. Then there will be a cluster mode in this section, which means whether the OpenMLDB you are connected to is a cluster mode or not.

Then we need to pass a SQL parameter, that defines which table we will extract numbers from.

Third, database. Based on the requirements of our OpenMLDB SDK, when SQL is required to be executed, the database name cannot be written in the SQL, and only the table name host and port are allowed in the SQL. host and port are only needed for the stand-alone version, and for the cluster version, only the Zookeeper connection information needs to be provided.

At the same time, because the bottom layer uses the HTTP protocol, it will have HTTP request timeout or HTTP session timeout parameters for users to control. For example, if a table is very large, and it is computationally stressful for OpenMLDB, you can set a timeout to wait.

To use Apache SeaTunnel to extract data from OpenMLDB, you only need to configure a few parameters, and the data can be written to the target terminal in a total of no more than 30 lines.

env {
job.name = “openmldb_to_console”
job.mode = “BATCH”
}

source {
OpenMldb {
host = "172.17.0.2"
port = 6527
sql = "select * from demo_table1"
database = "demo_db"
cluster_mode = false
}
}

sink {
Console {}
}

For the detailed usage of some plug-ins, I suggest you can refer to this link: https://seatunnel.apache.org/docs/2.3.0/connector-v2/source/OpenMldb

And the Sink Connector of OpenMLDB will be launched soon, we will wait and see.

📌📌Welcome to fill out this survey to give your feedback on your user experience or just your ideas about Apache SeaTunnel:)

--

--

Apache SeaTunnel
Apache SeaTunnel

Written by Apache SeaTunnel

The next-generation high-performance, distributed, massive data integration tool.

No responses yet