ETL异构数据源Datax_工具部署_02

╰+攻爆jí腚メ 2022-10-08 11:17 270阅读 0赞

接上一篇:(企业内部) ETL异构数据源Datax_部署前置环境_01
https://gblfy.blog.csdn.net/article/details/118081253

文章目录

          • 一、直接下载DataX
          • 二、下载DataX源码,自己编译
            • 2.1.下载DataX源码
            • 2.2. 通过maven打包
          • 三、配置示例
            • 3.1. 创建作业的配置文件(json格式)
            • 3.2. 根据模板配置json
            • 3.3. 执行测试
一、直接下载DataX

:DataX下载地址

下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

  1. tar zxvf datax.tar.gz
  2. cd { YOUR_DATAX_HOME}/bin
  3. python datax.py { YOUR_JOB.json}

自检脚本:

  1. python { YOUR_DATAX_HOME}/bin/datax.py { YOUR_DATAX_HOME}/job/job.json
二、下载DataX源码,自己编译

DataX源码

2.1.下载DataX源码
  1. git clone git@github.com:alibaba/DataX.git
2.2. 通过maven打包
  1. cd { DataX_source_code_home}
  2. mvn -U clean package assembly:assembly -Dmaven.test.skip=true

打包成功,日志显示如下:

  1. [INFO] BUILD SUCCESS
  2. [INFO] -----------------------------------------------------------------
  3. [INFO] Total time: 08:12 min
  4. [INFO] Finished at: 2021-12-13T16:26:48+08:00
  5. [INFO] Final Memory: 133M/960M
  6. [INFO] -----------------------------------------------------------------

打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:

  1. bin conf job lib log log_perf plugin script tmp
三、配置示例

从stream读取数据并打印到控制台

3.1. 创建作业的配置文件(json格式)
  1. 可以通过命令查看配置模板:
  2. python datax.py -r { YOUR_READER} -w { YOUR_WRITER}
  3. cd { YOUR_DATAX_HOME}/bin
  4. python datax.py -r streamreader -w streamwriter
  5. ----------------------------------------------------------
  6. # 查看常用作业的配置文件模板
  7. python datax.py -r streamreader -w streamwriter
  8. python datax.py -r oraclereader -w mysqlwriter
  9. python datax.py -r mysqlreader -w oraclewriter
  10. 控制台输出
  11. ```bash
  12. DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
  13. Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
  14. Please refer to the streamreader document:
  15. https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
  16. Please refer to the streamwriter document:
  17. https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
  18. Please save the following configuration as a json file and use
  19. python { DATAX_HOME}/bin/datax.py { JSON_FILE_NAME}.json
  20. to run the job.
  21. {
  22. "job": {
  23. "content": [
  24. {
  25. "reader": {
  26. "name": "streamreader",
  27. "parameter": {
  28. "column": [],
  29. "sliceRecordCount": ""
  30. }
  31. },
  32. "writer": {
  33. "name": "streamwriter",
  34. "parameter": {
  35. "encoding": "",
  36. "print": true
  37. }
  38. }
  39. }
  40. ],
  41. "setting": {
  42. "speed": {
  43. "channel": ""
  44. }
  45. }
  46. }
  47. }
3.2. 根据模板配置json
  1. vim stream2stream.json
  2. 添加内容如下:
  3. {
  4. "job": {
  5. "content": [
  6. {
  7. "reader": {
  8. "name": "streamreader",
  9. "parameter": {
  10. "sliceRecordCount": 10,
  11. "column": [
  12. {
  13. "type": "long",
  14. "value": "10"
  15. },
  16. {
  17. "type": "string",
  18. "value": "hello,你好,世界-DataX"
  19. }
  20. ]
  21. }
  22. },
  23. "writer": {
  24. "name": "streamwriter",
  25. "parameter": {
  26. "encoding": "UTF-8",
  27. "print": true
  28. }
  29. }
  30. }
  31. ],
  32. "setting": {
  33. "speed": {
  34. "channel": 5
  35. }
  36. }
  37. }
  38. }
3.3. 执行测试

启动DataX

  1. cd { YOUR_DATAX_DIR_BIN}
  2. python datax.py ./stream2stream.json
  3. 同步结束,显示日志如下:
  4. ...
  5. 2021-06-23 09:43:14.869 [job-0] INFO StandAloneJobContainerCommunicator - Total 50 records, 950 bytes | Speed 95B/s, 5 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
  6. 2021-06-23 09:43:14.872 [job-0] INFO JobContainer -
  7. 任务启动时刻 : 2021-06-23 09:43:04
  8. 任务结束时刻 : 2021-06-23 09:43:14
  9. 任务总计耗时 : 10s
  10. 任务平均流量 : 95B/s
  11. 记录写入速度 : 5rec/s
  12. 读出记录总数 : 50
  13. 读写失败总数 : 0

发表评论

表情:
评论列表 (有 0 条评论,270人围观)

还没有评论,来说两句吧...

相关阅读