离线数仓 (十三) --------- DWD 层搭建

深碍√TFBOYSˉ_ 2024-04-01 11:30 153阅读 0赞

目录

  • 前言
  • 一、DWD 层 (用户行为日志)
      1. 日志解析思路
      1. get_json_object 函数使用
      1. 启动日志表
      1. 页面日志表
      1. 动作日志表
      1. 曝光日志表
      1. 错误日志表
      1. DWD 层用户行为数据加载脚本
  • 二、DWD层 (业务数据)
      1. 评价事实表 (事务型事实表)
      1. 订单明细事实表 (事务型事实表)
      1. 退单事实表 (事务型事实表)
      1. 加购事实表 (周期型快照事实表,每日快照)
      1. 收藏事实表 (周期型快照事实表,每日快照)
      1. 优惠券领用事实表 (累积型快照事实表)
      1. 支付事实表 (累积型快照事实表)
      1. 退款事实表 (累积型快照事实表)
      1. 订单事实表 (累积型快照事实表)
      1. DWD 层业务数据首日装载脚本
      1. DWD层业务数据每日装载脚本

前言

DWD 层主要完成的功能 :

  • 对用户行为数据解析。
  • 对业务数据采用维度模型重新建模。

一、DWD 层 (用户行为日志)

1. 日志解析思路

A、日志结构回顾

页面埋点日志

在这里插入图片描述
启动日志在这里插入图片描述
B、日志解析思路

在这里插入图片描述

2. get_json_object 函数使用

A、数据

  1. [{
  2. "name":"大郎","sex":"男","age":"25"},{
  3. "name":"西门庆","sex":"男","age":"47"}]

B、取出第一个 json 对象

  1. hive (gmall)>
  2. select get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]','$[0]');

结果是:{"name":"大郎","sex":"男","age":"25"}

C、取出第一个 json 的 age 字段的值

  1. hive (gmall)>
  2. SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]',"$[0].age");

结果是:25

3. 启动日志表

启动日志解析思路:启动日志表中每行数据对应一个启动记录,一个启动记录应该包含日志中的公共信息和启动信息。先将所有包含start字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS dwd_start_log;
  2. CREATE EXTERNAL TABLE dwd_start_log(
  3. `area_code` STRING COMMENT '地区编码',
  4. `brand` STRING COMMENT '手机品牌',
  5. `channel` STRING COMMENT '渠道',
  6. `is_new` STRING COMMENT '是否首次启动',
  7. `model` STRING COMMENT '手机型号',
  8. `mid_id` STRING COMMENT '设备id',
  9. `os` STRING COMMENT '操作系统',
  10. `user_id` STRING COMMENT '会员id',
  11. `version_code` STRING COMMENT 'app版本号',
  12. `entry` STRING COMMENT 'icon手机图标 notice 通知 install 安装后启动',
  13. `loading_time` BIGINT COMMENT '启动加载时间',
  14. `open_ad_id` STRING COMMENT '广告页ID ',
  15. `open_ad_ms` BIGINT COMMENT '广告总共播放时间',
  16. `open_ad_skip_ms` BIGINT COMMENT '用户跳过广告时点',
  17. `ts` BIGINT COMMENT '时间'
  18. ) COMMENT '启动日志表'
  19. PARTITIONED BY (`dt` STRING) -- 按照时间创建分区
  20. STORED AS PARQUET -- 采用parquet列式存储
  21. LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存储位置
  22. TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO压缩
  23. ;

B、数据导入

在这里插入图片描述

  1. hive (gmall)>
  2. insert overwrite table dwd_start_log partition(dt='2020-06-14')
  3. select
  4. get_json_object(line,'$.common.ar'),
  5. get_json_object(line,'$.common.ba'),
  6. get_json_object(line,'$.common.ch'),
  7. get_json_object(line,'$.common.is_new'),
  8. get_json_object(line,'$.common.md'),
  9. get_json_object(line,'$.common.mid'),
  10. get_json_object(line,'$.common.os'),
  11. get_json_object(line,'$.common.uid'),
  12. get_json_object(line,'$.common.vc'),
  13. get_json_object(line,'$.start.entry'),
  14. get_json_object(line,'$.start.loading_time'),
  15. get_json_object(line,'$.start.open_ad_id'),
  16. get_json_object(line,'$.start.open_ad_ms'),
  17. get_json_object(line,'$.start.open_ad_skip_ms'),
  18. get_json_object(line,'$.ts')
  19. from ods_log
  20. where dt='2020-06-14'
  21. and get_json_object(line,'$.start') is not null;

C、查看数据

  1. hive (gmall)>
  2. select * from dwd_start_log where dt='2020-06-14' limit 2;

4. 页面日志表

页面日志解析思路:页面日志表中每行数据对应一个页面访问记录,一个页面访问记录应该包含日志中的公共信息和页面信息。先将所有包含 page 字段的日志过滤出来,然后使用 get_json_object 函数解析每个字段。

页面日志表解析:

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS dwd_page_log;
  2. CREATE EXTERNAL TABLE dwd_page_log(
  3. `area_code` STRING COMMENT '地区编码',
  4. `brand` STRING COMMENT '手机品牌',
  5. `channel` STRING COMMENT '渠道',
  6. `is_new` STRING COMMENT '是否首次启动',
  7. `model` STRING COMMENT '手机型号',
  8. `mid_id` STRING COMMENT '设备id',
  9. `os` STRING COMMENT '操作系统',
  10. `user_id` STRING COMMENT '会员id',
  11. `version_code` STRING COMMENT 'app版本号',
  12. `during_time` BIGINT COMMENT '持续时间毫秒',
  13. `page_item` STRING COMMENT '目标id ',
  14. `page_item_type` STRING COMMENT '目标类型',
  15. `last_page_id` STRING COMMENT '上页类型',
  16. `page_id` STRING COMMENT '页面ID ',
  17. `source_type` STRING COMMENT '来源类型',
  18. `ts` bigint
  19. ) COMMENT '页面日志表'
  20. PARTITIONED BY (`dt` STRING)
  21. STORED AS PARQUET
  22. LOCATION '/warehouse/gmall/dwd/dwd_page_log'
  23. TBLPROPERTIES('parquet.compression'='lzo');

B、数据导入

  1. hive (gmall)>
  2. insert overwrite table dwd_page_log partition(dt='2020-06-14')
  3. select
  4. get_json_object(line,'$.common.ar'),
  5. get_json_object(line,'$.common.ba'),
  6. get_json_object(line,'$.common.ch'),
  7. get_json_object(line,'$.common.is_new'),
  8. get_json_object(line,'$.common.md'),
  9. get_json_object(line,'$.common.mid'),
  10. get_json_object(line,'$.common.os'),
  11. get_json_object(line,'$.common.uid'),
  12. get_json_object(line,'$.common.vc'),
  13. get_json_object(line,'$.page.during_time'),
  14. get_json_object(line,'$.page.item'),
  15. get_json_object(line,'$.page.item_type'),
  16. get_json_object(line,'$.page.last_page_id'),
  17. get_json_object(line,'$.page.page_id'),
  18. get_json_object(line,'$.page.source_type'),
  19. get_json_object(line,'$.ts')
  20. from ods_log
  21. where dt='2020-06-14'
  22. and get_json_object(line,'$.page') is not null;

C、查看数据

  1. hive (gmall)>
  2. select * from dwd_page_log where dt='2020-06-14' limit 2;

5. 动作日志表

动作日志解析思路:动作日志表中每行数据对应用户的一个动作记录,一个动作记录应当包含公共信息、页面信息以及动作信息。先将包含action字段的日志过滤出来,然后通过 UDTF 函数,将action数组 “炸开”(类似于explode函数的效果),然后使用 get_json_object 函数解析每个字段。

动作日志表解析:

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS dwd_action_log;
  2. CREATE EXTERNAL TABLE dwd_action_log(
  3. `area_code` STRING COMMENT '地区编码',
  4. `brand` STRING COMMENT '手机品牌',
  5. `channel` STRING COMMENT '渠道',
  6. `is_new` STRING COMMENT '是否首次启动',
  7. `model` STRING COMMENT '手机型号',
  8. `mid_id` STRING COMMENT '设备id',
  9. `os` STRING COMMENT '操作系统',
  10. `user_id` STRING COMMENT '会员id',
  11. `version_code` STRING COMMENT 'app版本号',
  12. `during_time` BIGINT COMMENT '持续时间毫秒',
  13. `page_item` STRING COMMENT '目标id ',
  14. `page_item_type` STRING COMMENT '目标类型',
  15. `last_page_id` STRING COMMENT '上页类型',
  16. `page_id` STRING COMMENT '页面id ',
  17. `source_type` STRING COMMENT '来源类型',
  18. `action_id` STRING COMMENT '动作id',
  19. `item` STRING COMMENT '目标id ',
  20. `item_type` STRING COMMENT '目标类型',
  21. `ts` BIGINT COMMENT '时间'
  22. ) COMMENT '动作日志表'
  23. PARTITIONED BY (`dt` STRING)
  24. STORED AS PARQUET
  25. LOCATION '/warehouse/gmall/dwd/dwd_action_log'
  26. TBLPROPERTIES('parquet.compression'='lzo');

B、创建 UDTF 函数 —— 设计思路

UDTF 思想:

在这里插入图片描述
UDTF 设计思路:

在这里插入图片描述

C、创建UDTF函数 —— 编写代码

(1)创建一个 maven 工程:hivefunction
(2)创建包名:com.fancy.hive.udtf
(3)引入如下依赖

  1. <dependencies>
  2. <!--添加hive依赖-->
  3. <dependency>
  4. <groupId>org.apache.hive</groupId>
  5. <artifactId>hive-exec</artifactId>
  6. <version>3.1.2</version>
  7. </dependency>
  8. </dependencies>

(4)编码

  1. package com.fancy.hive.udtf;
  2. import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
  3. import org.apache.hadoop.hive.ql.metadata.HiveException;
  4. import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
  5. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
  6. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
  7. import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
  8. import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
  9. import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
  10. import org.json.JSONArray;
  11. import java.util.ArrayList;
  12. import java.util.List;
  13. public class ExplodeJSONArray extends GenericUDTF {
  14. @Override
  15. public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
  16. // 1 参数合法性检查
  17. if (argOIs.length != 1) {
  18. throw new UDFArgumentException("explode_json_array 只需要一个参数");
  19. }
  20. // 2 第一个参数必须为string
  21. //判断参数是否为基础数据类型
  22. if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
  23. throw new UDFArgumentException("explode_json_array 只接受基础类型参数");
  24. }
  25. //将参数对象检查器强转为基础类型对象检查器
  26. PrimitiveObjectInspector argumentOI = (PrimitiveObjectInspector) argOIs[0];
  27. //判断参数是否为String类型
  28. if (argumentOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) {
  29. throw new UDFArgumentException("explode_json_array 只接受string类型的参数");
  30. }
  31. // 3 定义返回值名称和类型
  32. List<String> fieldNames = new ArrayList<String>();
  33. List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
  34. fieldNames.add("items");
  35. fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
  36. return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
  37. }
  38. public void process(Object[] objects) throws HiveException {
  39. // 1 获取传入的数据
  40. String jsonArray = objects[0].toString();
  41. // 2 将string转换为json数组
  42. JSONArray actions = new JSONArray(jsonArray);
  43. // 3 循环一次,取出数组中的一个json,并写出
  44. for (int i = 0; i < actions.length(); i++) {
  45. String[] result = new String[1];
  46. result[0] = actions.getString(i);
  47. forward(result);
  48. }
  49. }
  50. public void close() throws HiveException {
  51. }
  52. }

D、创建函数

(1)打包

(2)将 hivefunction-1.0-SNAPSHOT.jar 上传到 node101 的 /opt/module,然后再将该 jar 包上传到 HDFS 的 /user/hive/jars 路径下

  1. [fancy@node101 module]$ hadoop fs -mkdir -p /user/hive/jars
  2. [fancy@node101 module]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars

(3)创建永久函数与开发好的java class关联

  1. create function explode_json_array as 'com.fancy.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/hivefunction-1.0-SNAPSHOT.jar';

注意:
如果修改了自定义函数重新生成 jar 包怎么处理?只需要替换HDFS 路径上的旧 jar 包,然后重启 Hive 客户端即可。

E、数据导入

  1. insert overwrite table dwd_action_log partition(dt='2020-06-14')
  2. select
  3. get_json_object(line,'$.common.ar'),
  4. get_json_object(line,'$.common.ba'),
  5. get_json_object(line,'$.common.ch'),
  6. get_json_object(line,'$.common.is_new'),
  7. get_json_object(line,'$.common.md'),
  8. get_json_object(line,'$.common.mid'),
  9. get_json_object(line,'$.common.os'),
  10. get_json_object(line,'$.common.uid'),
  11. get_json_object(line,'$.common.vc'),
  12. get_json_object(line,'$.page.during_time'),
  13. get_json_object(line,'$.page.item'),
  14. get_json_object(line,'$.page.item_type'),
  15. get_json_object(line,'$.page.last_page_id'),
  16. get_json_object(line,'$.page.page_id'),
  17. get_json_object(line,'$.page.source_type'),
  18. get_json_object(action,'$.action_id'),
  19. get_json_object(action,'$.item'),
  20. get_json_object(action,'$.item_type'),
  21. get_json_object(action,'$.ts')
  22. from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action
  23. where dt='2020-06-14'
  24. and get_json_object(line,'$.actions') is not null;

F、查看数据

  1. select * from dwd_action_log where dt='2020-06-14' limit 2;

6. 曝光日志表

曝光日志解析思路:曝光日志表中每行数据对应一个曝光记录,一个曝光记录应当包含公共信息、页面信息以及曝光信息。先将包含 display 字段的日志过滤出来,然后通过 UDTF 函数,将 display 数组 “炸开”(类似于 explode 函数的效果),然后使用 get_json_object 函数解析每个字段。

曝光日志表解析

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS dwd_display_log;
  2. CREATE EXTERNAL TABLE dwd_display_log(
  3. `area_code` STRING COMMENT '地区编码',
  4. `brand` STRING COMMENT '手机品牌',
  5. `channel` STRING COMMENT '渠道',
  6. `is_new` STRING COMMENT '是否首次启动',
  7. `model` STRING COMMENT '手机型号',
  8. `mid_id` STRING COMMENT '设备id',
  9. `os` STRING COMMENT '操作系统',
  10. `user_id` STRING COMMENT '会员id',
  11. `version_code` STRING COMMENT 'app版本号',
  12. `during_time` BIGINT COMMENT 'app版本号',
  13. `page_item` STRING COMMENT '目标id ',
  14. `page_item_type` STRING COMMENT '目标类型',
  15. `last_page_id` STRING COMMENT '上页类型',
  16. `page_id` STRING COMMENT '页面ID ',
  17. `source_type` STRING COMMENT '来源类型',
  18. `ts` BIGINT COMMENT 'app版本号',
  19. `display_type` STRING COMMENT '曝光类型',
  20. `item` STRING COMMENT '曝光对象id ',
  21. `item_type` STRING COMMENT 'app版本号',
  22. `order` BIGINT COMMENT '曝光顺序',
  23. `pos_id` BIGINT COMMENT '曝光位置'
  24. ) COMMENT '曝光日志表'
  25. PARTITIONED BY (`dt` STRING)
  26. STORED AS PARQUET
  27. LOCATION '/warehouse/gmall/dwd/dwd_display_log'
  28. TBLPROPERTIES('parquet.compression'='lzo');

B、数据导入

  1. insert overwrite table dwd_display_log partition(dt='2020-06-14')
  2. select
  3. get_json_object(line,'$.common.ar'),
  4. get_json_object(line,'$.common.ba'),
  5. get_json_object(line,'$.common.ch'),
  6. get_json_object(line,'$.common.is_new'),
  7. get_json_object(line,'$.common.md'),
  8. get_json_object(line,'$.common.mid'),
  9. get_json_object(line,'$.common.os'),
  10. get_json_object(line,'$.common.uid'),
  11. get_json_object(line,'$.common.vc'),
  12. get_json_object(line,'$.page.during_time'),
  13. get_json_object(line,'$.page.item'),
  14. get_json_object(line,'$.page.item_type'),
  15. get_json_object(line,'$.page.last_page_id'),
  16. get_json_object(line,'$.page.page_id'),
  17. get_json_object(line,'$.page.source_type'),
  18. get_json_object(line,'$.ts'),
  19. get_json_object(display,'$.display_type'),
  20. get_json_object(display,'$.item'),
  21. get_json_object(display,'$.item_type'),
  22. get_json_object(display,'$.order'),
  23. get_json_object(display,'$.pos_id')
  24. from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display
  25. where dt='2020-06-14'
  26. and get_json_object(line,'$.displays') is not null;

C、查看数据

  1. select * from dwd_display_log where dt='2020-06-14' limit 2;

7. 错误日志表

错误日志解析思路:错误日志表中每行数据对应一个错误记录,为方便定位错误,一个错误记录应当包含与之对应的公共信息、页面信息、曝光信息、动作信息、启动信息以及错误信息。先将包含err字段的日志过滤出来,然后使用 get_json_object 函数解析所有字段。

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS dwd_error_log;
  2. CREATE EXTERNAL TABLE dwd_error_log(
  3. `area_code` STRING COMMENT '地区编码',
  4. `brand` STRING COMMENT '手机品牌',
  5. `channel` STRING COMMENT '渠道',
  6. `is_new` STRING COMMENT '是否首次启动',
  7. `model` STRING COMMENT '手机型号',
  8. `mid_id` STRING COMMENT '设备id',
  9. `os` STRING COMMENT '操作系统',
  10. `user_id` STRING COMMENT '会员id',
  11. `version_code` STRING COMMENT 'app版本号',
  12. `page_item` STRING COMMENT '目标id ',
  13. `page_item_type` STRING COMMENT '目标类型',
  14. `last_page_id` STRING COMMENT '上页类型',
  15. `page_id` STRING COMMENT '页面ID ',
  16. `source_type` STRING COMMENT '来源类型',
  17. `entry` STRING COMMENT ' icon手机图标 notice 通知 install 安装后启动',
  18. `loading_time` STRING COMMENT '启动加载时间',
  19. `open_ad_id` STRING COMMENT '广告页ID ',
  20. `open_ad_ms` STRING COMMENT '广告总共播放时间',
  21. `open_ad_skip_ms` STRING COMMENT '用户跳过广告时点',
  22. `actions` STRING COMMENT '动作',
  23. `displays` STRING COMMENT '曝光',
  24. `ts` STRING COMMENT '时间',
  25. `error_code` STRING COMMENT '错误码',
  26. `msg` STRING COMMENT '错误信息'
  27. ) COMMENT '错误日志表'
  28. PARTITIONED BY (`dt` STRING)
  29. STORED AS PARQUET
  30. LOCATION '/warehouse/gmall/dwd/dwd_error_log'
  31. TBLPROPERTIES('parquet.compression'='lzo');

说明:此处为对动作数组和曝光数组做处理,如需分析错误与单个动作或曝光的关联,可先使用 explode_json_array 函数将数组“炸开”,再使用get_json_object 函数获取具体字段。

D、数据导入

  1. insert overwrite table dwd_error_log partition(dt='2020-06-14')
  2. select
  3. get_json_object(line,'$.common.ar'),
  4. get_json_object(line,'$.common.ba'),
  5. get_json_object(line,'$.common.ch'),
  6. get_json_object(line,'$.common.is_new'),
  7. get_json_object(line,'$.common.md'),
  8. get_json_object(line,'$.common.mid'),
  9. get_json_object(line,'$.common.os'),
  10. get_json_object(line,'$.common.uid'),
  11. get_json_object(line,'$.common.vc'),
  12. get_json_object(line,'$.page.item'),
  13. get_json_object(line,'$.page.item_type'),
  14. get_json_object(line,'$.page.last_page_id'),
  15. get_json_object(line,'$.page.page_id'),
  16. get_json_object(line,'$.page.source_type'),
  17. get_json_object(line,'$.start.entry'),
  18. get_json_object(line,'$.start.loading_time'),
  19. get_json_object(line,'$.start.open_ad_id'),
  20. get_json_object(line,'$.start.open_ad_ms'),
  21. get_json_object(line,'$.start.open_ad_skip_ms'),
  22. get_json_object(line,'$.actions'),
  23. get_json_object(line,'$.displays'),
  24. get_json_object(line,'$.ts'),
  25. get_json_object(line,'$.err.error_code'),
  26. get_json_object(line,'$.err.msg')
  27. from ods_log
  28. where dt='2020-06-14'
  29. and get_json_object(line,'$.err') is not null;

D、查看数据

  1. hive (gmall)>
  2. select * from dwd_error_log where dt='2020-06-14' limit 2;

8. DWD 层用户行为数据加载脚本

A、编写脚本

在 node101 的 /home/fancy/bin 目录下创建脚本

  1. [fancy@node101 bin]$ vim ods_to_dwd_log.sh

在脚本中编写如下内容

  1. #!/bin/bash
  2. APP=gmall
  3. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  4. if [ -n "$2" ] ;then
  5. do_date=$2
  6. else
  7. do_date=`date -d "-1 day" +%F`
  8. fi
  9. dwd_start_log="
  10. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  11. insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date')
  12. select
  13. get_json_object(line,'$.common.ar'),
  14. get_json_object(line,'$.common.ba'),
  15. get_json_object(line,'$.common.ch'),
  16. get_json_object(line,'$.common.is_new'),
  17. get_json_object(line,'$.common.md'),
  18. get_json_object(line,'$.common.mid'),
  19. get_json_object(line,'$.common.os'),
  20. get_json_object(line,'$.common.uid'),
  21. get_json_object(line,'$.common.vc'),
  22. get_json_object(line,'$.start.entry'),
  23. get_json_object(line,'$.start.loading_time'),
  24. get_json_object(line,'$.start.open_ad_id'),
  25. get_json_object(line,'$.start.open_ad_ms'),
  26. get_json_object(line,'$.start.open_ad_skip_ms'),
  27. get_json_object(line,'$.ts')
  28. from ${APP}.ods_log
  29. where dt='$do_date'
  30. and get_json_object(line,'$.start') is not null;"
  31. dwd_page_log="
  32. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  33. insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date')
  34. select
  35. get_json_object(line,'$.common.ar'),
  36. get_json_object(line,'$.common.ba'),
  37. get_json_object(line,'$.common.ch'),
  38. get_json_object(line,'$.common.is_new'),
  39. get_json_object(line,'$.common.md'),
  40. get_json_object(line,'$.common.mid'),
  41. get_json_object(line,'$.common.os'),
  42. get_json_object(line,'$.common.uid'),
  43. get_json_object(line,'$.common.vc'),
  44. get_json_object(line,'$.page.during_time'),
  45. get_json_object(line,'$.page.item'),
  46. get_json_object(line,'$.page.item_type'),
  47. get_json_object(line,'$.page.last_page_id'),
  48. get_json_object(line,'$.page.page_id'),
  49. get_json_object(line,'$.page.source_type'),
  50. get_json_object(line,'$.ts')
  51. from ${APP}.ods_log
  52. where dt='$do_date'
  53. and get_json_object(line,'$.page') is not null;"
  54. dwd_action_log="
  55. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  56. insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date')
  57. select
  58. get_json_object(line,'$.common.ar'),
  59. get_json_object(line,'$.common.ba'),
  60. get_json_object(line,'$.common.ch'),
  61. get_json_object(line,'$.common.is_new'),
  62. get_json_object(line,'$.common.md'),
  63. get_json_object(line,'$.common.mid'),
  64. get_json_object(line,'$.common.os'),
  65. get_json_object(line,'$.common.uid'),
  66. get_json_object(line,'$.common.vc'),
  67. get_json_object(line,'$.page.during_time'),
  68. get_json_object(line,'$.page.item'),
  69. get_json_object(line,'$.page.item_type'),
  70. get_json_object(line,'$.page.last_page_id'),
  71. get_json_object(line,'$.page.page_id'),
  72. get_json_object(line,'$.page.source_type'),
  73. get_json_object(action,'$.action_id'),
  74. get_json_object(action,'$.item'),
  75. get_json_object(action,'$.item_type'),
  76. get_json_object(action,'$.ts')
  77. from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action
  78. where dt='$do_date'
  79. and get_json_object(line,'$.actions') is not null;"
  80. dwd_display_log="
  81. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  82. insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date')
  83. select
  84. get_json_object(line,'$.common.ar'),
  85. get_json_object(line,'$.common.ba'),
  86. get_json_object(line,'$.common.ch'),
  87. get_json_object(line,'$.common.is_new'),
  88. get_json_object(line,'$.common.md'),
  89. get_json_object(line,'$.common.mid'),
  90. get_json_object(line,'$.common.os'),
  91. get_json_object(line,'$.common.uid'),
  92. get_json_object(line,'$.common.vc'),
  93. get_json_object(line,'$.page.during_time'),
  94. get_json_object(line,'$.page.item'),
  95. get_json_object(line,'$.page.item_type'),
  96. get_json_object(line,'$.page.last_page_id'),
  97. get_json_object(line,'$.page.page_id'),
  98. get_json_object(line,'$.page.source_type'),
  99. get_json_object(line,'$.ts'),
  100. get_json_object(display,'$.display_type'),
  101. get_json_object(display,'$.item'),
  102. get_json_object(display,'$.item_type'),
  103. get_json_object(display,'$.order'),
  104. get_json_object(display,'$.pos_id')
  105. from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display
  106. where dt='$do_date'
  107. and get_json_object(line,'$.displays') is not null;"
  108. dwd_error_log="
  109. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  110. insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date')
  111. select
  112. get_json_object(line,'$.common.ar'),
  113. get_json_object(line,'$.common.ba'),
  114. get_json_object(line,'$.common.ch'),
  115. get_json_object(line,'$.common.is_new'),
  116. get_json_object(line,'$.common.md'),
  117. get_json_object(line,'$.common.mid'),
  118. get_json_object(line,'$.common.os'),
  119. get_json_object(line,'$.common.uid'),
  120. get_json_object(line,'$.common.vc'),
  121. get_json_object(line,'$.page.item'),
  122. get_json_object(line,'$.page.item_type'),
  123. get_json_object(line,'$.page.last_page_id'),
  124. get_json_object(line,'$.page.page_id'),
  125. get_json_object(line,'$.page.source_type'),
  126. get_json_object(line,'$.start.entry'),
  127. get_json_object(line,'$.start.loading_time'),
  128. get_json_object(line,'$.start.open_ad_id'),
  129. get_json_object(line,'$.start.open_ad_ms'),
  130. get_json_object(line,'$.start.open_ad_skip_ms'),
  131. get_json_object(line,'$.actions'),
  132. get_json_object(line,'$.displays'),
  133. get_json_object(line,'$.ts'),
  134. get_json_object(line,'$.err.error_code'),
  135. get_json_object(line,'$.err.msg')
  136. from ${APP}.ods_log
  137. where dt='$do_date'
  138. and get_json_object(line,'$.err') is not null;"
  139. case $1 in
  140. dwd_start_log )
  141. hive -e "$dwd_start_log"
  142. ;;
  143. dwd_page_log )
  144. hive -e "$dwd_page_log"
  145. ;;
  146. dwd_action_log )
  147. hive -e "$dwd_action_log"
  148. ;;
  149. dwd_display_log )
  150. hive -e "$dwd_display_log"
  151. ;;
  152. dwd_error_log )
  153. hive -e "$dwd_error_log"
  154. ;;
  155. all )
  156. hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log"
  157. ;;
  158. esac

增加脚本执行权限

  1. [fancy@node101 bin]$ chmod 777 ods_to_dwd_log.sh

B、脚本使用

执行脚本

  1. [fancy@node101 module]$ ods_to_dwd_log.sh all 2020-06-14

查询导入结果

二、DWD层 (业务数据)

业务数据方面DWD层的搭建主要注意点在于维度建模。

1. 评价事实表 (事务型事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_comment_info;
  2. CREATE EXTERNAL TABLE dwd_comment_info(
  3. `id` STRING COMMENT '编号',
  4. `user_id` STRING COMMENT '用户ID',
  5. `sku_id` STRING COMMENT '商品sku',
  6. `spu_id` STRING COMMENT '商品spu',
  7. `order_id` STRING COMMENT '订单ID',
  8. `appraise` STRING COMMENT '评价(好评、中评、差评、默认评价)',
  9. `create_time` STRING COMMENT '评价时间'
  10. ) COMMENT '评价事实表'
  11. PARTITIONED BY (`dt` STRING)
  12. STORED AS PARQUET
  13. LOCATION '/warehouse/gmall/dwd/dwd_comment_info/'
  14. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划

在这里插入图片描述

C、数据装载

在这里插入图片描述
1.首日装载

  1. insert overwrite table dwd_comment_info partition (dt)
  2. select
  3. id,
  4. user_id,
  5. sku_id,
  6. spu_id,
  7. order_id,
  8. appraise,
  9. create_time,
  10. date_format(create_time,'yyyy-MM-dd')
  11. from ods_comment_info
  12. where dt='2020-06-14';

2.每日装载

  1. insert overwrite table dwd_comment_info partition(dt='2020-06-15')
  2. select
  3. id,
  4. user_id,
  5. sku_id,
  6. spu_id,
  7. order_id,
  8. appraise,
  9. create_time
  10. from ods_comment_info where dt='2020-06-15';

2. 订单明细事实表 (事务型事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_order_detail;
  2. CREATE EXTERNAL TABLE dwd_order_detail (
  3. `id` STRING COMMENT '订单编号',
  4. `order_id` STRING COMMENT '订单号',
  5. `user_id` STRING COMMENT '用户id',
  6. `sku_id` STRING COMMENT 'sku商品id',
  7. `province_id` STRING COMMENT '省份ID',
  8. `activity_id` STRING COMMENT '活动ID',
  9. `activity_rule_id` STRING COMMENT '活动规则ID',
  10. `coupon_id` STRING COMMENT '优惠券ID',
  11. `create_time` STRING COMMENT '创建时间',
  12. `source_type` STRING COMMENT '来源类型',
  13. `source_id` STRING COMMENT '来源编号',
  14. `sku_num` BIGINT COMMENT '商品数量',
  15. `original_amount` DECIMAL(16,2) COMMENT '原始价格',
  16. `split_activity_amount` DECIMAL(16,2) COMMENT '活动优惠分摊',
  17. `split_coupon_amount` DECIMAL(16,2) COMMENT '优惠券优惠分摊',
  18. `split_final_amount` DECIMAL(16,2) COMMENT '最终价格分摊'
  19. ) COMMENT '订单明细事实表表'
  20. PARTITIONED BY (`dt` STRING)
  21. STORED AS PARQUET
  22. LOCATION '/warehouse/gmall/dwd/dwd_order_detail/'
  23. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划

在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_order_detail partition(dt)
  2. select
  3. od.id,
  4. od.order_id,
  5. oi.user_id,
  6. od.sku_id,
  7. oi.province_id,
  8. oda.activity_id,
  9. oda.activity_rule_id,
  10. odc.coupon_id,
  11. od.create_time,
  12. od.source_type,
  13. od.source_id,
  14. od.sku_num,
  15. od.order_price*od.sku_num,
  16. od.split_activity_amount,
  17. od.split_coupon_amount,
  18. od.split_final_amount,
  19. date_format(create_time,'yyyy-MM-dd')
  20. from
  21. (
  22. select
  23. *
  24. from ods_order_detail
  25. where dt='2020-06-14'
  26. )od
  27. left join
  28. (
  29. select
  30. id,
  31. user_id,
  32. province_id
  33. from ods_order_info
  34. where dt='2020-06-14'
  35. )oi
  36. on od.order_id=oi.id
  37. left join
  38. (
  39. select
  40. order_detail_id,
  41. activity_id,
  42. activity_rule_id
  43. from ods_order_detail_activity
  44. where dt='2020-06-14'
  45. )oda
  46. on od.id=oda.order_detail_id
  47. left join
  48. (
  49. select
  50. order_detail_id,
  51. coupon_id
  52. from ods_order_detail_coupon
  53. where dt='2020-06-14'
  54. )odc
  55. on od.id=odc.order_detail_id;

2.每日装载

  1. insert overwrite table dwd_order_detail partition(dt='2020-06-15')
  2. select
  3. od.id,
  4. od.order_id,
  5. oi.user_id,
  6. od.sku_id,
  7. oi.province_id,
  8. oda.activity_id,
  9. oda.activity_rule_id,
  10. odc.coupon_id,
  11. od.create_time,
  12. od.source_type,
  13. od.source_id,
  14. od.sku_num,
  15. od.order_price*od.sku_num,
  16. od.split_activity_amount,
  17. od.split_coupon_amount,
  18. od.split_final_amount
  19. from
  20. (
  21. select
  22. *
  23. from ods_order_detail
  24. where dt='2020-06-15'
  25. )od
  26. left join
  27. (
  28. select
  29. id,
  30. user_id,
  31. province_id
  32. from ods_order_info
  33. where dt='2020-06-15'
  34. )oi
  35. on od.order_id=oi.id
  36. left join
  37. (
  38. select
  39. order_detail_id,
  40. activity_id,
  41. activity_rule_id
  42. from ods_order_detail_activity
  43. where dt='2020-06-15'
  44. )oda
  45. on od.id=oda.order_detail_id
  46. left join
  47. (
  48. select
  49. order_detail_id,
  50. coupon_id
  51. from ods_order_detail_coupon
  52. where dt='2020-06-15'
  53. )odc
  54. on od.id=odc.order_detail_id;

3. 退单事实表 (事务型事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_order_refund_info;
  2. CREATE EXTERNAL TABLE dwd_order_refund_info(
  3. `id` STRING COMMENT '编号',
  4. `user_id` STRING COMMENT '用户ID',
  5. `order_id` STRING COMMENT '订单ID',
  6. `sku_id` STRING COMMENT '商品ID',
  7. `province_id` STRING COMMENT '地区ID',
  8. `refund_type` STRING COMMENT '退单类型',
  9. `refund_num` BIGINT COMMENT '退单件数',
  10. `refund_amount` DECIMAL(16,2) COMMENT '退单金额',
  11. `refund_reason_type` STRING COMMENT '退单原因类型',
  12. `create_time` STRING COMMENT '退单时间'
  13. ) COMMENT '退单事实表'
  14. PARTITIONED BY (`dt` STRING)
  15. STORED AS PARQUET
  16. LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/'
  17. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划
在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_order_refund_info partition(dt)
  2. select
  3. ri.id,
  4. ri.user_id,
  5. ri.order_id,
  6. ri.sku_id,
  7. oi.province_id,
  8. ri.refund_type,
  9. ri.refund_num,
  10. ri.refund_amount,
  11. ri.refund_reason_type,
  12. ri.create_time,
  13. date_format(ri.create_time,'yyyy-MM-dd')
  14. from
  15. (
  16. select * from ods_order_refund_info where dt='2020-06-14'
  17. )ri
  18. left join
  19. (
  20. select id,province_id from ods_order_info where dt='2020-06-14'
  21. )oi
  22. on ri.order_id=oi.id;

2.每日装载

  1. insert overwrite table dwd_order_refund_info partition(dt='2020-06-15')
  2. select
  3. ri.id,
  4. ri.user_id,
  5. ri.order_id,
  6. ri.sku_id,
  7. oi.province_id,
  8. ri.refund_type,
  9. ri.refund_num,
  10. ri.refund_amount,
  11. ri.refund_reason_type,
  12. ri.create_time
  13. from
  14. (
  15. select * from ods_order_refund_info where dt='2020-06-15'
  16. )ri
  17. left join
  18. (
  19. select id,province_id from ods_order_info where dt='2020-06-15'
  20. )oi
  21. on ri.order_id=oi.id;

3.查询加载结果

4. 加购事实表 (周期型快照事实表,每日快照)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_cart_info;
  2. CREATE EXTERNAL TABLE dwd_cart_info(
  3. `id` STRING COMMENT '编号',
  4. `user_id` STRING COMMENT '用户ID',
  5. `sku_id` STRING COMMENT '商品ID',
  6. `source_type` STRING COMMENT '来源类型',
  7. `source_id` STRING COMMENT '来源编号',
  8. `cart_price` DECIMAL(16,2) COMMENT '加入购物车时的价格',
  9. `is_ordered` STRING COMMENT '是否已下单',
  10. `create_time` STRING COMMENT '创建时间',
  11. `operate_time` STRING COMMENT '修改时间',
  12. `order_time` STRING COMMENT '下单时间',
  13. `sku_num` BIGINT COMMENT '加购数量'
  14. ) COMMENT '加购事实表'
  15. PARTITIONED BY (`dt` STRING)
  16. STORED AS PARQUET
  17. LOCATION '/warehouse/gmall/dwd/dwd_cart_info/'
  18. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划

在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_cart_info partition(dt='2020-06-14')
  2. select
  3. id,
  4. user_id,
  5. sku_id,
  6. source_type,
  7. source_id,
  8. cart_price,
  9. is_ordered,
  10. create_time,
  11. operate_time,
  12. order_time,
  13. sku_num
  14. from ods_cart_info
  15. where dt='2020-06-14';

2.每日装载

  1. insert overwrite table dwd_cart_info partition(dt='2020-06-15')
  2. select
  3. id,
  4. user_id,
  5. sku_id,
  6. source_type,
  7. source_id,
  8. cart_price,
  9. is_ordered,
  10. create_time,
  11. operate_time,
  12. order_time,
  13. sku_num
  14. from ods_cart_info
  15. where dt='2020-06-15';

5. 收藏事实表 (周期型快照事实表,每日快照)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_favor_info;
  2. CREATE EXTERNAL TABLE dwd_favor_info(
  3. `id` STRING COMMENT '编号',
  4. `user_id` STRING COMMENT '用户id',
  5. `sku_id` STRING COMMENT 'skuid',
  6. `spu_id` STRING COMMENT 'spuid',
  7. `is_cancel` STRING COMMENT '是否取消',
  8. `create_time` STRING COMMENT '收藏时间',
  9. `cancel_time` STRING COMMENT '取消时间'
  10. ) COMMENT '收藏事实表'
  11. PARTITIONED BY (`dt` STRING)
  12. STORED AS PARQUET
  13. LOCATION '/warehouse/gmall/dwd/dwd_favor_info/'
  14. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划

在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_favor_info partition(dt='2020-06-14')
  2. select
  3. id,
  4. user_id,
  5. sku_id,
  6. spu_id,
  7. is_cancel,
  8. create_time,
  9. cancel_time
  10. from ods_favor_info
  11. where dt='2020-06-14';

2.每日装载

  1. insert overwrite table dwd_favor_info partition(dt='2020-06-15')
  2. select
  3. id,
  4. user_id,
  5. sku_id,
  6. spu_id,
  7. is_cancel,
  8. create_time,
  9. cancel_time
  10. from ods_favor_info
  11. where dt='2020-06-15';

6. 优惠券领用事实表 (累积型快照事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_coupon_use;
  2. CREATE EXTERNAL TABLE dwd_coupon_use(
  3. `id` STRING COMMENT '编号',
  4. `coupon_id` STRING COMMENT '优惠券ID',
  5. `user_id` STRING COMMENT 'userid',
  6. `order_id` STRING COMMENT '订单id',
  7. `coupon_status` STRING COMMENT '优惠券状态',
  8. `get_time` STRING COMMENT '领取时间',
  9. `using_time` STRING COMMENT '使用时间(下单)',
  10. `used_time` STRING COMMENT '使用时间(支付)',
  11. `expire_time` STRING COMMENT '过期时间'
  12. ) COMMENT '优惠券领用事实表'
  13. PARTITIONED BY (`dt` STRING)
  14. STORED AS PARQUET
  15. LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/'
  16. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划
在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_coupon_use partition(dt)
  2. select
  3. id,
  4. coupon_id,
  5. user_id,
  6. order_id,
  7. coupon_status,
  8. get_time,
  9. using_time,
  10. used_time,
  11. expire_time,
  12. coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
  13. from ods_coupon_use
  14. where dt='2020-06-14';

2.每日装载

a.装载逻辑
在这里插入图片描述
b.转载语句

  1. insert overwrite table dwd_coupon_use partition(dt)
  2. select
  3. nvl(new.id,old.id),
  4. nvl(new.coupon_id,old.coupon_id),
  5. nvl(new.user_id,old.user_id),
  6. nvl(new.order_id,old.order_id),
  7. nvl(new.coupon_status,old.coupon_status),
  8. nvl(new.get_time,old.get_time),
  9. nvl(new.using_time,old.using_time),
  10. nvl(new.used_time,old.used_time),
  11. nvl(new.expire_time,old.expire_time),
  12. coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
  13. from
  14. (
  15. select
  16. id,
  17. coupon_id,
  18. user_id,
  19. order_id,
  20. coupon_status,
  21. get_time,
  22. using_time,
  23. used_time,
  24. expire_time
  25. from dwd_coupon_use
  26. where dt='9999-99-99'
  27. )old
  28. full outer join
  29. (
  30. select
  31. id,
  32. coupon_id,
  33. user_id,
  34. order_id,
  35. coupon_status,
  36. get_time,
  37. using_time,
  38. used_time,
  39. expire_time
  40. from ods_coupon_use
  41. where dt='2020-06-15'
  42. )new
  43. on old.id=new.id;

7. 支付事实表 (累积型快照事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_payment_info;
  2. CREATE EXTERNAL TABLE dwd_payment_info (
  3. `id` STRING COMMENT '编号',
  4. `order_id` STRING COMMENT '订单编号',
  5. `user_id` STRING COMMENT '用户编号',
  6. `province_id` STRING COMMENT '地区ID',
  7. `trade_no` STRING COMMENT '交易编号',
  8. `out_trade_no` STRING COMMENT '对外交易编号',
  9. `payment_type` STRING COMMENT '支付类型',
  10. `payment_amount` DECIMAL(16,2) COMMENT '支付金额',
  11. `payment_status` STRING COMMENT '支付状态',
  12. `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间
  13. `callback_time` STRING COMMENT '完成时间'--支付完成时间,即支付成功回调时间
  14. ) COMMENT '支付事实表表'
  15. PARTITIONED BY (`dt` STRING)
  16. STORED AS PARQUET
  17. LOCATION '/warehouse/gmall/dwd/dwd_payment_info/'
  18. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划
在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_payment_info partition(dt)
  2. select
  3. pi.id,
  4. pi.order_id,
  5. pi.user_id,
  6. oi.province_id,
  7. pi.trade_no,
  8. pi.out_trade_no,
  9. pi.payment_type,
  10. pi.payment_amount,
  11. pi.payment_status,
  12. pi.create_time,
  13. pi.callback_time,
  14. nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
  15. from
  16. (
  17. select * from ods_payment_info where dt='2020-06-14'
  18. )pi
  19. left join
  20. (
  21. select id,province_id from ods_order_info where dt='2020-06-14'
  22. )oi
  23. on pi.order_id=oi.id;

2.每日装载

  1. insert overwrite table dwd_payment_info partition(dt)
  2. select
  3. nvl(new.id,old.id),
  4. nvl(new.order_id,old.order_id),
  5. nvl(new.user_id,old.user_id),
  6. nvl(new.province_id,old.province_id),
  7. nvl(new.trade_no,old.trade_no),
  8. nvl(new.out_trade_no,old.out_trade_no),
  9. nvl(new.payment_type,old.payment_type),
  10. nvl(new.payment_amount,old.payment_amount),
  11. nvl(new.payment_status,old.payment_status),
  12. nvl(new.create_time,old.create_time),
  13. nvl(new.callback_time,old.callback_time),
  14. nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
  15. from
  16. (
  17. select id,
  18. order_id,
  19. user_id,
  20. province_id,
  21. trade_no,
  22. out_trade_no,
  23. payment_type,
  24. payment_amount,
  25. payment_status,
  26. create_time,
  27. callback_time
  28. from dwd_payment_info
  29. where dt = '9999-99-99'
  30. )old
  31. full outer join
  32. (
  33. select
  34. pi.id,
  35. pi.out_trade_no,
  36. pi.order_id,
  37. pi.user_id,
  38. oi.province_id,
  39. pi.payment_type,
  40. pi.trade_no,
  41. pi.payment_amount,
  42. pi.payment_status,
  43. pi.create_time,
  44. pi.callback_time
  45. from
  46. (
  47. select * from ods_payment_info where dt='2020-06-15'
  48. )pi
  49. left join
  50. (
  51. select id,province_id from ods_order_info where dt='2020-06-15'
  52. )oi
  53. on pi.order_id=oi.id
  54. )new
  55. on old.id=new.id;

8. 退款事实表 (累积型快照事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_refund_payment;
  2. CREATE EXTERNAL TABLE dwd_refund_payment (
  3. `id` STRING COMMENT '编号',
  4. `user_id` STRING COMMENT '用户ID',
  5. `order_id` STRING COMMENT '订单编号',
  6. `sku_id` STRING COMMENT 'SKU编号',
  7. `province_id` STRING COMMENT '地区ID',
  8. `trade_no` STRING COMMENT '交易编号',
  9. `out_trade_no` STRING COMMENT '对外交易编号',
  10. `payment_type` STRING COMMENT '支付类型',
  11. `refund_amount` DECIMAL(16,2) COMMENT '退款金额',
  12. `refund_status` STRING COMMENT '退款状态',
  13. `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间
  14. `callback_time` STRING COMMENT '回调时间'--支付接口回调时间,即支付成功时间
  15. ) COMMENT '退款事实表'
  16. PARTITIONED BY (`dt` STRING)
  17. STORED AS PARQUET
  18. LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/'
  19. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划
在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_refund_payment partition(dt)
  2. select
  3. rp.id,
  4. user_id,
  5. order_id,
  6. sku_id,
  7. province_id,
  8. trade_no,
  9. out_trade_no,
  10. payment_type,
  11. refund_amount,
  12. refund_status,
  13. create_time,
  14. callback_time,
  15. nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
  16. from
  17. (
  18. select
  19. id,
  20. out_trade_no,
  21. order_id,
  22. sku_id,
  23. payment_type,
  24. trade_no,
  25. refund_amount,
  26. refund_status,
  27. create_time,
  28. callback_time
  29. from ods_refund_payment
  30. where dt='2020-06-14'
  31. )rp
  32. left join
  33. (
  34. select
  35. id,
  36. user_id,
  37. province_id
  38. from ods_order_info
  39. where dt='2020-06-14'
  40. )oi
  41. on rp.order_id=oi.id;

2.每日装载

  1. insert overwrite table dwd_refund_payment partition(dt)
  2. select
  3. nvl(new.id,old.id),
  4. nvl(new.user_id,old.user_id),
  5. nvl(new.order_id,old.order_id),
  6. nvl(new.sku_id,old.sku_id),
  7. nvl(new.province_id,old.province_id),
  8. nvl(new.trade_no,old.trade_no),
  9. nvl(new.out_trade_no,old.out_trade_no),
  10. nvl(new.payment_type,old.payment_type),
  11. nvl(new.refund_amount,old.refund_amount),
  12. nvl(new.refund_status,old.refund_status),
  13. nvl(new.create_time,old.create_time),
  14. nvl(new.callback_time,old.callback_time),
  15. nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
  16. from
  17. (
  18. select
  19. id,
  20. user_id,
  21. order_id,
  22. sku_id,
  23. province_id,
  24. trade_no,
  25. out_trade_no,
  26. payment_type,
  27. refund_amount,
  28. refund_status,
  29. create_time,
  30. callback_time
  31. from dwd_refund_payment
  32. where dt='9999-99-99'
  33. )old
  34. full outer join
  35. (
  36. select
  37. rp.id,
  38. user_id,
  39. order_id,
  40. sku_id,
  41. province_id,
  42. trade_no,
  43. out_trade_no,
  44. payment_type,
  45. refund_amount,
  46. refund_status,
  47. create_time,
  48. callback_time
  49. from
  50. (
  51. select
  52. id,
  53. out_trade_no,
  54. order_id,
  55. sku_id,
  56. payment_type,
  57. trade_no,
  58. refund_amount,
  59. refund_status,
  60. create_time,
  61. callback_time
  62. from ods_refund_payment
  63. where dt='2020-06-15'
  64. )rp
  65. left join
  66. (
  67. select
  68. id,
  69. user_id,
  70. province_id
  71. from ods_order_info
  72. where dt='2020-06-15'
  73. )oi
  74. on rp.order_id=oi.id
  75. )new
  76. on old.id=new.id;

D、查询加载结果

9. 订单事实表 (累积型快照事实表)

A、建表语句

  1. DROP TABLE IF EXISTS dwd_order_info;
  2. CREATE EXTERNAL TABLE dwd_order_info(
  3. `id` STRING COMMENT '编号',
  4. `order_status` STRING COMMENT '订单状态',
  5. `user_id` STRING COMMENT '用户ID',
  6. `province_id` STRING COMMENT '地区ID',
  7. `payment_way` STRING COMMENT '支付方式',
  8. `delivery_address` STRING COMMENT '邮寄地址',
  9. `out_trade_no` STRING COMMENT '对外交易编号',
  10. `tracking_no` STRING COMMENT '物流单号',
  11. `create_time` STRING COMMENT '创建时间(未支付状态)',
  12. `payment_time` STRING COMMENT '支付时间(已支付状态)',
  13. `cancel_time` STRING COMMENT '取消时间(已取消状态)',
  14. `finish_time` STRING COMMENT '完成时间(已完成状态)',
  15. `refund_time` STRING COMMENT '退款时间(退款中状态)',
  16. `refund_finish_time` STRING COMMENT '退款完成时间(退款完成状态)',
  17. `expire_time` STRING COMMENT '过期时间',
  18. `feight_fee` DECIMAL(16,2) COMMENT '运费',
  19. `feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免',
  20. `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免',
  21. `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免',
  22. `original_amount` DECIMAL(16,2) COMMENT '订单原始价格',
  23. `final_amount` DECIMAL(16,2) COMMENT '订单最终价格'
  24. ) COMMENT '订单事实表'
  25. PARTITIONED BY (`dt` STRING)
  26. STORED AS PARQUET
  27. LOCATION '/warehouse/gmall/dwd/dwd_order_info/'
  28. TBLPROPERTIES ("parquet.compression"="lzo");

B、分区规划
在这里插入图片描述

C、数据装载

在这里插入图片描述

1.首日装载

  1. insert overwrite table dwd_order_info partition(dt)
  2. select
  3. oi.id,
  4. oi.order_status,
  5. oi.user_id,
  6. oi.province_id,
  7. oi.payment_way,
  8. oi.delivery_address,
  9. oi.out_trade_no,
  10. oi.tracking_no,
  11. oi.create_time,
  12. times.ts['1002'] payment_time,
  13. times.ts['1003'] cancel_time,
  14. times.ts['1004'] finish_time,
  15. times.ts['1005'] refund_time,
  16. times.ts['1006'] refund_finish_time,
  17. oi.expire_time,
  18. feight_fee,
  19. feight_fee_reduce,
  20. activity_reduce_amount,
  21. coupon_reduce_amount,
  22. original_amount,
  23. final_amount,
  24. case
  25. when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')
  26. when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)
  27. when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')
  28. when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')
  29. else '9999-99-99'
  30. end
  31. from
  32. (
  33. select
  34. *
  35. from ods_order_info
  36. where dt='2020-06-14'
  37. )oi
  38. left join
  39. (
  40. select
  41. order_id,
  42. str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
  43. from ods_order_status_log
  44. where dt='2020-06-14'
  45. group by order_id
  46. )times
  47. on oi.id=times.order_id;

2.每日装载

  1. insert overwrite table dwd_order_info partition(dt)
  2. select
  3. nvl(new.id,old.id),
  4. nvl(new.order_status,old.order_status),
  5. nvl(new.user_id,old.user_id),
  6. nvl(new.province_id,old.province_id),
  7. nvl(new.payment_way,old.payment_way),
  8. nvl(new.delivery_address,old.delivery_address),
  9. nvl(new.out_trade_no,old.out_trade_no),
  10. nvl(new.tracking_no,old.tracking_no),
  11. nvl(new.create_time,old.create_time),
  12. nvl(new.payment_time,old.payment_time),
  13. nvl(new.cancel_time,old.cancel_time),
  14. nvl(new.finish_time,old.finish_time),
  15. nvl(new.refund_time,old.refund_time),
  16. nvl(new.refund_finish_time,old.refund_finish_time),
  17. nvl(new.expire_time,old.expire_time),
  18. nvl(new.feight_fee,old.feight_fee),
  19. nvl(new.feight_fee_reduce,old.feight_fee_reduce),
  20. nvl(new.activity_reduce_amount,old.activity_reduce_amount),
  21. nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),
  22. nvl(new.original_amount,old.original_amount),
  23. nvl(new.final_amount,old.final_amount),
  24. case
  25. when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')
  26. when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15'
  27. when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')
  28. when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')
  29. else '9999-99-99'
  30. end
  31. from
  32. (
  33. select
  34. id,
  35. order_status,
  36. user_id,
  37. province_id,
  38. payment_way,
  39. delivery_address,
  40. out_trade_no,
  41. tracking_no,
  42. create_time,
  43. payment_time,
  44. cancel_time,
  45. finish_time,
  46. refund_time,
  47. refund_finish_time,
  48. expire_time,
  49. feight_fee,
  50. feight_fee_reduce,
  51. activity_reduce_amount,
  52. coupon_reduce_amount,
  53. original_amount,
  54. final_amount
  55. from dwd_order_info
  56. where dt='9999-99-99'
  57. )old
  58. full outer join
  59. (
  60. select
  61. oi.id,
  62. oi.order_status,
  63. oi.user_id,
  64. oi.province_id,
  65. oi.payment_way,
  66. oi.delivery_address,
  67. oi.out_trade_no,
  68. oi.tracking_no,
  69. oi.create_time,
  70. times.ts['1002'] payment_time,
  71. times.ts['1003'] cancel_time,
  72. times.ts['1004'] finish_time,
  73. times.ts['1005'] refund_time,
  74. times.ts['1006'] refund_finish_time,
  75. oi.expire_time,
  76. feight_fee,
  77. feight_fee_reduce,
  78. activity_reduce_amount,
  79. coupon_reduce_amount,
  80. original_amount,
  81. final_amount
  82. from
  83. (
  84. select
  85. *
  86. from ods_order_info
  87. where dt='2020-06-15'
  88. )oi
  89. left join
  90. (
  91. select
  92. order_id,
  93. str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
  94. from ods_order_status_log
  95. where dt='2020-06-15'
  96. group by order_id
  97. )times
  98. on oi.id=times.order_id
  99. )new
  100. on old.id=new.id;

10. DWD 层业务数据首日装载脚本

A、编写脚本

在 /home/fancy/bin 目录下创建脚本 ods_to_dwd_db_init.sh

  1. [fancy@node101 bin]$ vim ods_to_dwd_db_init.sh

在脚本中填写如下内容

  1. #!/bin/bash
  2. APP=gmall
  3. if [ -n "$2" ] ;then
  4. do_date=$2
  5. else
  6. echo "请传入日期参数"
  7. exit
  8. fi
  9. dwd_order_info="
  10. set hive.exec.dynamic.partition.mode=nonstrict;
  11. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  12. insert overwrite table ${APP}.dwd_order_info partition(dt)
  13. select
  14. oi.id,
  15. oi.order_status,
  16. oi.user_id,
  17. oi.province_id,
  18. oi.payment_way,
  19. oi.delivery_address,
  20. oi.out_trade_no,
  21. oi.tracking_no,
  22. oi.create_time,
  23. times.ts['1002'] payment_time,
  24. times.ts['1003'] cancel_time,
  25. times.ts['1004'] finish_time,
  26. times.ts['1005'] refund_time,
  27. times.ts['1006'] refund_finish_time,
  28. oi.expire_time,
  29. feight_fee,
  30. feight_fee_reduce,
  31. activity_reduce_amount,
  32. coupon_reduce_amount,
  33. original_amount,
  34. final_amount,
  35. case
  36. when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')
  37. when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)
  38. when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')
  39. when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')
  40. else '9999-99-99'
  41. end
  42. from
  43. (
  44. select
  45. *
  46. from ${APP}.ods_order_info
  47. where dt='$do_date'
  48. )oi
  49. left join
  50. (
  51. select
  52. order_id,
  53. str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
  54. from ${APP}.ods_order_status_log
  55. where dt='$do_date'
  56. group by order_id
  57. )times
  58. on oi.id=times.order_id;"
  59. dwd_order_detail="
  60. set hive.exec.dynamic.partition.mode=nonstrict;
  61. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  62. insert overwrite table ${APP}.dwd_order_detail partition(dt)
  63. select
  64. od.id,
  65. od.order_id,
  66. oi.user_id,
  67. od.sku_id,
  68. oi.province_id,
  69. oda.activity_id,
  70. oda.activity_rule_id,
  71. odc.coupon_id,
  72. od.create_time,
  73. od.source_type,
  74. od.source_id,
  75. od.sku_num,
  76. od.order_price*od.sku_num,
  77. od.split_activity_amount,
  78. od.split_coupon_amount,
  79. od.split_final_amount,
  80. date_format(create_time,'yyyy-MM-dd')
  81. from
  82. (
  83. select
  84. *
  85. from ${APP}.ods_order_detail
  86. where dt='$do_date'
  87. )od
  88. left join
  89. (
  90. select
  91. id,
  92. user_id,
  93. province_id
  94. from ${APP}.ods_order_info
  95. where dt='$do_date'
  96. )oi
  97. on od.order_id=oi.id
  98. left join
  99. (
  100. select
  101. order_detail_id,
  102. activity_id,
  103. activity_rule_id
  104. from ${APP}.ods_order_detail_activity
  105. where dt='$do_date'
  106. )oda
  107. on od.id=oda.order_detail_id
  108. left join
  109. (
  110. select
  111. order_detail_id,
  112. coupon_id
  113. from ${APP}.ods_order_detail_coupon
  114. where dt='$do_date'
  115. )odc
  116. on od.id=odc.order_detail_id;"
  117. dwd_payment_info="
  118. set hive.exec.dynamic.partition.mode=nonstrict;
  119. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  120. insert overwrite table ${APP}.dwd_payment_info partition(dt)
  121. select
  122. pi.id,
  123. pi.order_id,
  124. pi.user_id,
  125. oi.province_id,
  126. pi.trade_no,
  127. pi.out_trade_no,
  128. pi.payment_type,
  129. pi.payment_amount,
  130. pi.payment_status,
  131. pi.create_time,
  132. pi.callback_time,
  133. nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
  134. from
  135. (
  136. select * from ${APP}.ods_payment_info where dt='$do_date'
  137. )pi
  138. left join
  139. (
  140. select id,province_id from ${APP}.ods_order_info where dt='$do_date'
  141. )oi
  142. on pi.order_id=oi.id;"
  143. dwd_cart_info="
  144. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  145. insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
  146. select
  147. id,
  148. user_id,
  149. sku_id,
  150. source_type,
  151. source_id,
  152. cart_price,
  153. is_ordered,
  154. create_time,
  155. operate_time,
  156. order_time,
  157. sku_num
  158. from ${APP}.ods_cart_info
  159. where dt='$do_date';"
  160. dwd_comment_info="
  161. set hive.exec.dynamic.partition.mode=nonstrict;
  162. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  163. insert overwrite table ${APP}.dwd_comment_info partition(dt)
  164. select
  165. id,
  166. user_id,
  167. sku_id,
  168. spu_id,
  169. order_id,
  170. appraise,
  171. create_time,
  172. date_format(create_time,'yyyy-MM-dd')
  173. from ${APP}.ods_comment_info
  174. where dt='$do_date';
  175. "
  176. dwd_favor_info="
  177. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  178. insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
  179. select
  180. id,
  181. user_id,
  182. sku_id,
  183. spu_id,
  184. is_cancel,
  185. create_time,
  186. cancel_time
  187. from ${APP}.ods_favor_info
  188. where dt='$do_date';"
  189. dwd_coupon_use="
  190. set hive.exec.dynamic.partition.mode=nonstrict;
  191. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  192. insert overwrite table ${APP}.dwd_coupon_use partition(dt)
  193. select
  194. id,
  195. coupon_id,
  196. user_id,
  197. order_id,
  198. coupon_status,
  199. get_time,
  200. using_time,
  201. used_time,
  202. expire_time,
  203. coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
  204. from ${APP}.ods_coupon_use
  205. where dt='$do_date';"
  206. dwd_order_refund_info="
  207. set hive.exec.dynamic.partition.mode=nonstrict;
  208. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  209. insert overwrite table ${APP}.dwd_order_refund_info partition(dt)
  210. select
  211. ri.id,
  212. ri.user_id,
  213. ri.order_id,
  214. ri.sku_id,
  215. oi.province_id,
  216. ri.refund_type,
  217. ri.refund_num,
  218. ri.refund_amount,
  219. ri.refund_reason_type,
  220. ri.create_time,
  221. date_format(ri.create_time,'yyyy-MM-dd')
  222. from
  223. (
  224. select * from ${APP}.ods_order_refund_info where dt='$do_date'
  225. )ri
  226. left join
  227. (
  228. select id,province_id from ${APP}.ods_order_info where dt='$do_date'
  229. )oi
  230. on ri.order_id=oi.id;"
  231. dwd_refund_payment="
  232. set hive.exec.dynamic.partition.mode=nonstrict;
  233. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  234. insert overwrite table ${APP}.dwd_refund_payment partition(dt)
  235. select
  236. rp.id,
  237. user_id,
  238. order_id,
  239. sku_id,
  240. province_id,
  241. trade_no,
  242. out_trade_no,
  243. payment_type,
  244. refund_amount,
  245. refund_status,
  246. create_time,
  247. callback_time,
  248. nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
  249. from
  250. (
  251. select
  252. id,
  253. out_trade_no,
  254. order_id,
  255. sku_id,
  256. payment_type,
  257. trade_no,
  258. refund_amount,
  259. refund_status,
  260. create_time,
  261. callback_time
  262. from ${APP}.ods_refund_payment
  263. where dt='$do_date'
  264. )rp
  265. left join
  266. (
  267. select
  268. id,
  269. user_id,
  270. province_id
  271. from ${APP}.ods_order_info
  272. where dt='$do_date'
  273. )oi
  274. on rp.order_id=oi.id;"
  275. case $1 in
  276. dwd_order_info )
  277. hive -e "$dwd_order_info"
  278. ;;
  279. dwd_order_detail )
  280. hive -e "$dwd_order_detail"
  281. ;;
  282. dwd_payment_info )
  283. hive -e "$dwd_payment_info"
  284. ;;
  285. dwd_cart_info )
  286. hive -e "$dwd_cart_info"
  287. ;;
  288. dwd_comment_info )
  289. hive -e "$dwd_comment_info"
  290. ;;
  291. dwd_favor_info )
  292. hive -e "$dwd_favor_info"
  293. ;;
  294. dwd_coupon_use )
  295. hive -e "$dwd_coupon_use"
  296. ;;
  297. dwd_order_refund_info )
  298. hive -e "$dwd_order_refund_info"
  299. ;;
  300. dwd_refund_payment )
  301. hive -e "$dwd_refund_payment"
  302. ;;
  303. all )
  304. hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"
  305. ;;
  306. esac

增加执行权限

  1. [fancy@node101 bin]$ chmod +x ods_to_dwd_db_init.sh

B、脚本使用

执行脚本

  1. [fancy@node101 bin]$ ods_to_dwd_db_init.sh all 2020-06-14

查看数据是否导入成功

11. DWD层业务数据每日装载脚本

A、编写脚本

在 /home/fancy/bin 目录下创建脚本ods_to_dwd_db.sh

  1. [fancy@node101 bin]$ vim ods_to_dwd_db.sh

在脚本中填写如下内容

  1. #!/bin/bash
  2. APP=gmall
  3. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  4. if [ -n "$2" ] ;then
  5. do_date=$2
  6. else
  7. do_date=`date -d "-1 day" +%F`
  8. fi
  9. # 假设某累积型快照事实表,某天所有的业务记录全部完成,则会导致9999-99-99分区的数据未被覆盖,从而导致数据重复,该函数根据9999-99-99分区的数据的末次修改时间判断其是否被覆盖了,如果未被覆盖,就手动清理
  10. clear_data(){
  11. current_date=`date +%F`
  12. current_date_timestamp=`date -d "$current_date" +%s`
  13. last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'`
  14. last_modified_date_timestamp=`date -d "$last_modified_date" +%s`
  15. if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; then
  16. echo "clear table $1 partition(dt=9999-99-99)"
  17. hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/*
  18. fi
  19. }
  20. dwd_order_info="
  21. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  22. set hive.exec.dynamic.partition.mode=nonstrict;
  23. insert overwrite table ${APP}.dwd_order_info partition(dt)
  24. select
  25. nvl(new.id,old.id),
  26. nvl(new.order_status,old.order_status),
  27. nvl(new.user_id,old.user_id),
  28. nvl(new.province_id,old.province_id),
  29. nvl(new.payment_way,old.payment_way),
  30. nvl(new.delivery_address,old.delivery_address),
  31. nvl(new.out_trade_no,old.out_trade_no),
  32. nvl(new.tracking_no,old.tracking_no),
  33. nvl(new.create_time,old.create_time),
  34. nvl(new.payment_time,old.payment_time),
  35. nvl(new.cancel_time,old.cancel_time),
  36. nvl(new.finish_time,old.finish_time),
  37. nvl(new.refund_time,old.refund_time),
  38. nvl(new.refund_finish_time,old.refund_finish_time),
  39. nvl(new.expire_time,old.expire_time),
  40. nvl(new.feight_fee,old.feight_fee),
  41. nvl(new.feight_fee_reduce,old.feight_fee_reduce),
  42. nvl(new.activity_reduce_amount,old.activity_reduce_amount),
  43. nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),
  44. nvl(new.original_amount,old.original_amount),
  45. nvl(new.final_amount,old.final_amount),
  46. case
  47. when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')
  48. when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date'
  49. when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')
  50. when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')
  51. else '9999-99-99'
  52. end
  53. from
  54. (
  55. select
  56. id,
  57. order_status,
  58. user_id,
  59. province_id,
  60. payment_way,
  61. delivery_address,
  62. out_trade_no,
  63. tracking_no,
  64. create_time,
  65. payment_time,
  66. cancel_time,
  67. finish_time,
  68. refund_time,
  69. refund_finish_time,
  70. expire_time,
  71. feight_fee,
  72. feight_fee_reduce,
  73. activity_reduce_amount,
  74. coupon_reduce_amount,
  75. original_amount,
  76. final_amount
  77. from ${APP}.dwd_order_info
  78. where dt='9999-99-99'
  79. )old
  80. full outer join
  81. (
  82. select
  83. oi.id,
  84. oi.order_status,
  85. oi.user_id,
  86. oi.province_id,
  87. oi.payment_way,
  88. oi.delivery_address,
  89. oi.out_trade_no,
  90. oi.tracking_no,
  91. oi.create_time,
  92. times.ts['1002'] payment_time,
  93. times.ts['1003'] cancel_time,
  94. times.ts['1004'] finish_time,
  95. times.ts['1005'] refund_time,
  96. times.ts['1006'] refund_finish_time,
  97. oi.expire_time,
  98. feight_fee,
  99. feight_fee_reduce,
  100. activity_reduce_amount,
  101. coupon_reduce_amount,
  102. original_amount,
  103. final_amount
  104. from
  105. (
  106. select
  107. *
  108. from ${APP}.ods_order_info
  109. where dt='$do_date'
  110. )oi
  111. left join
  112. (
  113. select
  114. order_id,
  115. str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
  116. from ${APP}.ods_order_status_log
  117. where dt='$do_date'
  118. group by order_id
  119. )times
  120. on oi.id=times.order_id
  121. )new
  122. on old.id=new.id;"
  123. dwd_order_detail="
  124. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  125. insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date')
  126. select
  127. od.id,
  128. od.order_id,
  129. oi.user_id,
  130. od.sku_id,
  131. oi.province_id,
  132. oda.activity_id,
  133. oda.activity_rule_id,
  134. odc.coupon_id,
  135. od.create_time,
  136. od.source_type,
  137. od.source_id,
  138. od.sku_num,
  139. od.order_price*od.sku_num,
  140. od.split_activity_amount,
  141. od.split_coupon_amount,
  142. od.split_final_amount
  143. from
  144. (
  145. select
  146. *
  147. from ${APP}.ods_order_detail
  148. where dt='$do_date'
  149. )od
  150. left join
  151. (
  152. select
  153. id,
  154. user_id,
  155. province_id
  156. from ${APP}.ods_order_info
  157. where dt='$do_date'
  158. )oi
  159. on od.order_id=oi.id
  160. left join
  161. (
  162. select
  163. order_detail_id,
  164. activity_id,
  165. activity_rule_id
  166. from ${APP}.ods_order_detail_activity
  167. where dt='$do_date'
  168. )oda
  169. on od.id=oda.order_detail_id
  170. left join
  171. (
  172. select
  173. order_detail_id,
  174. coupon_id
  175. from ${APP}.ods_order_detail_coupon
  176. where dt='$do_date'
  177. )odc
  178. on od.id=odc.order_detail_id;"
  179. dwd_payment_info="
  180. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  181. set hive.exec.dynamic.partition.mode=nonstrict;
  182. insert overwrite table ${APP}.dwd_payment_info partition(dt)
  183. select
  184. nvl(new.id,old.id),
  185. nvl(new.order_id,old.order_id),
  186. nvl(new.user_id,old.user_id),
  187. nvl(new.province_id,old.province_id),
  188. nvl(new.trade_no,old.trade_no),
  189. nvl(new.out_trade_no,old.out_trade_no),
  190. nvl(new.payment_type,old.payment_type),
  191. nvl(new.payment_amount,old.payment_amount),
  192. nvl(new.payment_status,old.payment_status),
  193. nvl(new.create_time,old.create_time),
  194. nvl(new.callback_time,old.callback_time),
  195. nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
  196. from
  197. (
  198. select id,
  199. order_id,
  200. user_id,
  201. province_id,
  202. trade_no,
  203. out_trade_no,
  204. payment_type,
  205. payment_amount,
  206. payment_status,
  207. create_time,
  208. callback_time
  209. from ${APP}.dwd_payment_info
  210. where dt = '9999-99-99'
  211. )old
  212. full outer join
  213. (
  214. select
  215. pi.id,
  216. pi.out_trade_no,
  217. pi.order_id,
  218. pi.user_id,
  219. oi.province_id,
  220. pi.payment_type,
  221. pi.trade_no,
  222. pi.payment_amount,
  223. pi.payment_status,
  224. pi.create_time,
  225. pi.callback_time
  226. from
  227. (
  228. select * from ${APP}.ods_payment_info where dt='$do_date'
  229. )pi
  230. left join
  231. (
  232. select id,province_id from ${APP}.ods_order_info where dt='$do_date'
  233. )oi
  234. on pi.order_id=oi.id
  235. )new
  236. on old.id=new.id;"
  237. dwd_cart_info="
  238. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  239. insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
  240. select
  241. id,
  242. user_id,
  243. sku_id,
  244. source_type,
  245. source_id,
  246. cart_price,
  247. is_ordered,
  248. create_time,
  249. operate_time,
  250. order_time,
  251. sku_num
  252. from ${APP}.ods_cart_info
  253. where dt='$do_date';"
  254. dwd_comment_info="
  255. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  256. insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date')
  257. select
  258. id,
  259. user_id,
  260. sku_id,
  261. spu_id,
  262. order_id,
  263. appraise,
  264. create_time
  265. from ${APP}.ods_comment_info where dt='$do_date';"
  266. dwd_favor_info="
  267. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  268. insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
  269. select
  270. id,
  271. user_id,
  272. sku_id,
  273. spu_id,
  274. is_cancel,
  275. create_time,
  276. cancel_time
  277. from ${APP}.ods_favor_info
  278. where dt='$do_date';"
  279. dwd_coupon_use="
  280. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  281. set hive.exec.dynamic.partition.mode=nonstrict;
  282. insert overwrite table ${APP}.dwd_coupon_use partition(dt)
  283. select
  284. nvl(new.id,old.id),
  285. nvl(new.coupon_id,old.coupon_id),
  286. nvl(new.user_id,old.user_id),
  287. nvl(new.order_id,old.order_id),
  288. nvl(new.coupon_status,old.coupon_status),
  289. nvl(new.get_time,old.get_time),
  290. nvl(new.using_time,old.using_time),
  291. nvl(new.used_time,old.used_time),
  292. nvl(new.expire_time,old.expire_time),
  293. coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
  294. from
  295. (
  296. select
  297. id,
  298. coupon_id,
  299. user_id,
  300. order_id,
  301. coupon_status,
  302. get_time,
  303. using_time,
  304. used_time,
  305. expire_time
  306. from ${APP}.dwd_coupon_use
  307. where dt='9999-99-99'
  308. )old
  309. full outer join
  310. (
  311. select
  312. id,
  313. coupon_id,
  314. user_id,
  315. order_id,
  316. coupon_status,
  317. get_time,
  318. using_time,
  319. used_time,
  320. expire_time
  321. from ${APP}.ods_coupon_use
  322. where dt='$do_date'
  323. )new
  324. on old.id=new.id;"
  325. dwd_order_refund_info="
  326. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  327. insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date')
  328. select
  329. ri.id,
  330. ri.user_id,
  331. ri.order_id,
  332. ri.sku_id,
  333. oi.province_id,
  334. ri.refund_type,
  335. ri.refund_num,
  336. ri.refund_amount,
  337. ri.refund_reason_type,
  338. ri.create_time
  339. from
  340. (
  341. select * from ${APP}.ods_order_refund_info where dt='$do_date'
  342. )ri
  343. left join
  344. (
  345. select id,province_id from ${APP}.ods_order_info where dt='$do_date'
  346. )oi
  347. on ri.order_id=oi.id;"
  348. dwd_refund_payment="
  349. set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
  350. set hive.exec.dynamic.partition.mode=nonstrict;
  351. insert overwrite table ${APP}.dwd_refund_payment partition(dt)
  352. select
  353. nvl(new.id,old.id),
  354. nvl(new.user_id,old.user_id),
  355. nvl(new.order_id,old.order_id),
  356. nvl(new.sku_id,old.sku_id),
  357. nvl(new.province_id,old.province_id),
  358. nvl(new.trade_no,old.trade_no),
  359. nvl(new.out_trade_no,old.out_trade_no),
  360. nvl(new.payment_type,old.payment_type),
  361. nvl(new.refund_amount,old.refund_amount),
  362. nvl(new.refund_status,old.refund_status),
  363. nvl(new.create_time,old.create_time),
  364. nvl(new.callback_time,old.callback_time),
  365. nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
  366. from
  367. (
  368. select
  369. id,
  370. user_id,
  371. order_id,
  372. sku_id,
  373. province_id,
  374. trade_no,
  375. out_trade_no,
  376. payment_type,
  377. refund_amount,
  378. refund_status,
  379. create_time,
  380. callback_time
  381. from ${APP}.dwd_refund_payment
  382. where dt='9999-99-99'
  383. )old
  384. full outer join
  385. (
  386. select
  387. rp.id,
  388. user_id,
  389. order_id,
  390. sku_id,
  391. province_id,
  392. trade_no,
  393. out_trade_no,
  394. payment_type,
  395. refund_amount,
  396. refund_status,
  397. create_time,
  398. callback_time
  399. from
  400. (
  401. select
  402. id,
  403. out_trade_no,
  404. order_id,
  405. sku_id,
  406. payment_type,
  407. trade_no,
  408. refund_amount,
  409. refund_status,
  410. create_time,
  411. callback_time
  412. from ${APP}.ods_refund_payment
  413. where dt='$do_date'
  414. )rp
  415. left join
  416. (
  417. select
  418. id,
  419. user_id,
  420. province_id
  421. from ${APP}.ods_order_info
  422. where dt='$do_date'
  423. )oi
  424. on rp.order_id=oi.id
  425. )new
  426. on old.id=new.id;"
  427. case $1 in
  428. dwd_order_info )
  429. hive -e "$dwd_order_info"
  430. clear_data dwd_order_info
  431. ;;
  432. dwd_order_detail )
  433. hive -e "$dwd_order_detail"
  434. ;;
  435. dwd_payment_info )
  436. hive -e "$dwd_payment_info"
  437. clear_data dwd_payment_info
  438. ;;
  439. dwd_cart_info )
  440. hive -e "$dwd_cart_info"
  441. ;;
  442. dwd_comment_info )
  443. hive -e "$dwd_comment_info"
  444. ;;
  445. dwd_favor_info )
  446. hive -e "$dwd_favor_info"
  447. ;;
  448. dwd_coupon_use )
  449. hive -e "$dwd_coupon_use"
  450. clear_data dwd_coupon_use
  451. ;;
  452. dwd_order_refund_info )
  453. hive -e "$dwd_order_refund_info"
  454. ;;
  455. dwd_refund_payment )
  456. hive -e "$dwd_refund_payment"
  457. clear_data dwd_refund_payment
  458. ;;
  459. all )
  460. hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"
  461. clear_data dwd_order_info
  462. clear_data dwd_payment_info
  463. clear_data dwd_coupon_use
  464. clear_data dwd_refund_payment
  465. ;;
  466. esac

增加脚本执行权限

  1. [fancy@node101 bin]$ chmod 777 ods_to_dwd_db.sh

B、脚本使用

执行脚本

  1. [fancy@node101 bin]$ ods_to_dwd_db.sh all 2020-06-14

查看数据是否导入成功

发表评论

表情:
评论列表 (有 0 条评论,153人围观)

还没有评论,来说两句吧...

相关阅读