9 配置DataX(全量数据入湖)

作用:将安装debezium(CDC)前的数据库变动,写入到Hudi中。目前安装在10.20.3.75的机上。

9.1 安装并配置python

首先安装python2.0版本,一般linux都自带python2.7。使用指令python --version查看现有linux的python版本,如果没有,则执行下面的指令安装:

cd /usr/local
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make -y

wget https://www.python.org/ftp/python/2.7.9/
Python-2.7.9.tgz

tar -zxvf Python-2.7.9.tgz
cd Python-2.7.9
./configure --prefix=/usr/local/python-2.7.9
make
make install

ln -s /usr/local/python-2.7.9/bin/python /usr/bin/python2.7

如果想使用python作为python命令执行的开头,而不是python2.7,则使用如下命令进行修改:

ln -s /usr/local/python-2.7.9/bin/python /usr/bin/python

9.2 安装并配置DataX

安装DataX,使用如下指令:

wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

tar -zxvf datax.tar.gz

9.3 修改json文件

进入解压目录,然后更改datax目录【/software/datax/job】下的job.json,改的示例如下所示,修改的内容:

  • 1,数据库ip、端口、账户、密码、数据库库名【目标数据库和表都要存在,不会就会报错】
  • 2,执行的sql语句
{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                       "connection": [
                            {
                              "jdbcUrl": [ "jdbc:mysql://192.168.101.177:5575/db_clientauthority_test?serverTimezone=GMT%2B8&useUnicode=true&characterEncoding=utf-8&autoReconnect=true" ],
                              "querySql": [
                                    "select * from test_cdc"
                              ],
			   }
                        ],
                        "password": "sql_mYpwd@123",
                        "username": "root"
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": ["*"],
                        "connection": [
                          {
                            "jdbcUrl": "jdbc:mysql://10.20.3.82:3306/cdc_zy?serverTimezone=GMT%2B8&useUnicode=true&characterEncoding=utf-8&autoReconnect=true",
                            "table":["test_cdc"]
                          }
                        ],
                        "password": "root",
                        "username": "root",
                        "preSql": [],
                        "session": [],
                        "writeMode": "insert"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

9.4 启动DataX

启动DataX,进入到如下目录【/software/datax/bin】,执行如下指令:

python datax.py /software/datax/job/job.json

执行脚本成功后,将会执行json文件中的数据操作。

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐