本地数仓项目(二)——搭建系统业务数仓详细流程

客官°小女子只卖身不卖艺 2024-03-30 14:38 110阅读 0赞

1 说明

本文基于《本地数据仓库项目(一)——本地数仓搭建详细流程》业务数据,在本地搭建系统业务数仓。
根据模拟sql脚本生成业务数据,依次执行生成业务数据即可。
在这里插入图片描述
sql脚本提供如下

  1. 链接:https://pan.baidu.com/s/1AhLIuTNIyJ_GBD7M0b2RoA
  2. 提取码:1lm8

生成的数据如下:
在这里插入图片描述

2 业务数据导入数仓

数仓整体框架如下,在前面的《本地数据仓库项目(一)——本地数仓搭建详细流程》已完成对数据采集及分析整体流程。这里的业务数仓数据需要用到sqoop完成从mysql导入数据到HDFS中。
在这里插入图片描述

2.1 安装sqoop

2.1.1 解压并重命名

  1. tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
  2. mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6

2.1.2 配置SQOOP_HOME环境变量

  1. SQOOP_HOME=/root/soft/sqoop-1.4.6
  2. PATH=$PATH:$JAVA_HOME/bin:$SHELL_HOME:$FLUME_HOME/bin:$HIVE_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin

2.1.3 配置sqoop-env.sh

  1. mv sqoop-env-template.sh sqoop-env.sh
  2. export HADOOP_COMMON_HOME=/root/soft/hadoop-2.7.2
  3. export HADOOP_MAPRED_HOME=/root/soft/hadoop-2.7.2
  4. export HIVE_HOME=/root/soft/hive
  5. export ZOOKEEPER_HOME=/root/soft/zookeeper-3.4.10
  6. export ZOOCFGDIR=/root/soft/zookeeper-3.4.10

2.1.4拷贝mysql的jdbc驱动到sqoop的lib目录下

2.1.5 测试链接

  1. bin/sqoop list-databases --connect jdbc:mysql://192.168.2.100:3306/ --username root --password 123456

出现如下页面表示sqoop安装成功
在这里插入图片描述

2.2 sqoop导入数据到HDFS

如下sqoop脚本,可实现定时自动导入数据到HDFS

  1. #!/bin/bash
  2. db_date=$2
  3. echo $db_date
  4. db_name=gmall
  5. import_data() {
  6. /root/soft/sqoop-1.4.6/bin/sqoop import \
  7. --connect jdbc:mysql://192.168.2.100:3306/$db_name \
  8. --username root \
  9. --password 123456 \
  10. --target-dir /origin_data/$db_name/db/$1/$db_date \
  11. --delete-target-dir \
  12. --num-mappers 1 \
  13. --fields-terminated-by "\t" \
  14. --query "$2"' and $CONDITIONS;' \
  15. --null-string '\\N' \
  16. --null-non-string '\\N'
  17. }
  18. import_sku_info(){
  19. import_data "sku_info" "select
  20. id, spu_id, price, sku_name, sku_desc, weight, tm_id,
  21. category3_id, create_time
  22. from sku_info where 1=1"
  23. }
  24. import_user_info(){
  25. import_data "user_info" "select
  26. id, name, birthday, gender, email, user_level,
  27. create_time
  28. from user_info where 1=1"
  29. }
  30. import_base_category1(){
  31. import_data "base_category1" "select
  32. id, name from base_category1 where 1=1"
  33. }
  34. import_base_category2(){
  35. import_data "base_category2" "select
  36. id, name, category1_id from base_category2 where 1=1"
  37. }
  38. import_base_category3(){
  39. import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1"
  40. }
  41. import_order_detail(){
  42. import_data "order_detail" "select
  43. od.id,
  44. order_id,
  45. user_id,
  46. sku_id,
  47. sku_name,
  48. order_price,
  49. sku_num,
  50. o.create_time
  51. from order_info o, order_detail od
  52. where o.id=od.order_id
  53. and DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'"
  54. }
  55. import_payment_info(){
  56. import_data "payment_info" "select
  57. id,
  58. out_trade_no,
  59. order_id,
  60. user_id,
  61. alipay_trade_no,
  62. total_amount,
  63. subject,
  64. payment_type,
  65. payment_time
  66. from payment_info
  67. where DATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'"
  68. }
  69. import_order_info(){
  70. import_data "order_info" "select
  71. id,
  72. total_amount,
  73. order_status,
  74. user_id,
  75. payment_way,
  76. out_trade_no,
  77. create_time,
  78. operate_time
  79. from order_info
  80. where (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')"
  81. }
  82. case $1 in
  83. "base_category1")
  84. import_base_category1
  85. ;;
  86. "base_category2")
  87. import_base_category2
  88. ;;
  89. "base_category3")
  90. import_base_category3
  91. ;;
  92. "order_info")
  93. import_order_info
  94. ;;
  95. "order_detail")
  96. import_order_detail
  97. ;;
  98. "sku_info")
  99. import_sku_info
  100. ;;
  101. "user_info")
  102. import_user_info
  103. ;;
  104. "payment_info")
  105. import_payment_info
  106. ;;
  107. "all")
  108. import_base_category1
  109. import_base_category2
  110. import_base_category3
  111. import_order_info
  112. import_order_detail
  113. import_sku_info
  114. import_user_info
  115. import_payment_info
  116. ;;
  117. esac

注意:
①默认sqoop到import数据时,将Mysql的Null类型,转为’null’
②hive中使用\N代表NULL类型
③如果希望在import时,讲将Mysql的Null类型,转为自己期望的类型,
需要使用–null-string and —null-non-string

–null-string: 当mysql的string类型列为null时,导入到hive时,使用什么来代替!
–null-string a: 如果mysql中,当前列是字符串类型(varchar,char),假如这列值为NULL,导入到hive时,使用a来代替!
–null-non-string: 当mysql的非string类型列为null时,导入到hive时,使用什么来代替!
–null-non-string b: 如果mysql中,当前列不是字符串类型(varchar,char),假如这列值为NULL,导入到hive时,使用b来代替!

④如果到导出时,希望将指定的参数,导出为mysql的NULL类型,需要使用

–input-null-string and —input-null-non-string —input-null-string a: 在hive导出到mysql时,如果hive中string类型的列的值为a,导出到mysql中,使用NULL代替!
–input-null-non-string b:
在hive导出到mysql时,如果hive中非string类型的列的值为b,导出到mysql中,使用NULL代替!

执行脚本,导入数据
在这里插入图片描述
在这里插入图片描述

3 ODS层

3.1 创建ods表

3.1.1 创建订单表

  1. drop table if exists ods_order_info;
  2. create external table ods_order_info (
  3. `id` string COMMENT '订单编号',
  4. `total_amount` decimal(10,2) COMMENT '订单金额',
  5. `order_status` string COMMENT '订单状态',
  6. `user_id` string COMMENT '用户id',
  7. `payment_way` string COMMENT '支付方式',
  8. `out_trade_no` string COMMENT '支付流水号',
  9. `create_time` string COMMENT '创建时间',
  10. `operate_time` string COMMENT '操作时间'
  11. ) COMMENT '订单表'
  12. PARTITIONED BY (`dt` string)
  13. row format delimited fields terminated by '\t'
  14. location '/wavehouse/gmall/ods/ods_order_info/';

3.1.2 创建订单明细表

  1. drop table if exists ods_order_detail;
  2. create external table ods_order_detail(
  3. `id` string COMMENT '订单详情编号',
  4. `order_id` string COMMENT '订单号',
  5. `user_id` string COMMENT '用户id',
  6. `sku_id` string COMMENT '商品id',
  7. `sku_name` string COMMENT '商品名称',
  8. `order_price` string COMMENT '商品单价',
  9. `sku_num` string COMMENT '商品数量',
  10. `create_time` string COMMENT '创建时间'
  11. ) COMMENT '订单明细表'
  12. PARTITIONED BY (`dt` string)
  13. row format delimited fields terminated by '\t'
  14. location '/wavehouse/gmall/ods/ods_order_detail/';

3.1.3 创建商品信息表

  1. drop table if exists ods_sku_info;
  2. create external table ods_sku_info(
  3. `id` string COMMENT 'skuId',
  4. `spu_id` string COMMENT 'spuid',
  5. `price` decimal(10,2) COMMENT '价格',
  6. `sku_name` string COMMENT '商品名称',
  7. `sku_desc` string COMMENT '商品描述',
  8. `weight` string COMMENT '重量',
  9. `tm_id` string COMMENT '品牌id',
  10. `category3_id` string COMMENT '品类id',
  11. `create_time` string COMMENT '创建时间'
  12. ) COMMENT '商品表'
  13. PARTITIONED BY (`dt` string)
  14. row format delimited fields terminated by '\t'
  15. location '/wavehouse/gmall/ods/ods_sku_info/';

3.1.4 创建用户表

  1. drop table if exists ods_user_info;
  2. create external table ods_user_info(
  3. `id` string COMMENT '用户id',
  4. `name` string COMMENT '姓名',
  5. `birthday` string COMMENT '生日',
  6. `gender` string COMMENT '性别',
  7. `email` string COMMENT '邮箱',
  8. `user_level` string COMMENT '用户等级',
  9. `create_time` string COMMENT '创建时间'
  10. ) COMMENT '用户信息'
  11. PARTITIONED BY (`dt` string)
  12. row format delimited fields terminated by '\t'
  13. location '/wavehouse/gmall/ods/ods_user_info/';

3.1.5 创建商品一级分类表

  1. drop table if exists ods_base_category1;
  2. create external table ods_base_category1(
  3. `id` string COMMENT 'id',
  4. `name` string COMMENT '名称'
  5. ) COMMENT '商品一级分类'
  6. PARTITIONED BY (`dt` string)
  7. row format delimited fields terminated by '\t'
  8. location '/wavehouse/gmall/ods/ods_base_category1/';

3.1.6 创建商品二级分类表

  1. drop table if exists ods_base_category2;
  2. create external table ods_base_category2(
  3. `id` string COMMENT ' id',
  4. `name` string COMMENT '名称',
  5. category1_id string COMMENT '一级品类id'
  6. ) COMMENT '商品二级分类'
  7. PARTITIONED BY (`dt` string)
  8. row format delimited fields terminated by '\t'
  9. location '/wavehouse/gmall/ods/ods_base_category2/';

3.1.7 创建商品三级表

  1. drop table if exists ods_base_category3;
  2. create external table ods_base_category3(
  3. `id` string COMMENT ' id',
  4. `name` string COMMENT '名称',
  5. category2_id string COMMENT '二级品类id'
  6. ) COMMENT '商品三级分类'
  7. PARTITIONED BY (`dt` string)
  8. row format delimited fields terminated by '\t'
  9. location '/wavehouse/gmall/ods/ods_base_category3/';

3.1.8 创建支付流水表

  1. drop table if exists ods_payment_info;
  2. create external table ods_payment_info(
  3. `id` bigint COMMENT '编号',
  4. `out_trade_no` string COMMENT '对外业务编号',
  5. `order_id` string COMMENT '订单编号',
  6. `user_id` string COMMENT '用户编号',
  7. `alipay_trade_no` string COMMENT '支付宝交易流水编号',
  8. `total_amount` decimal(16,2) COMMENT '支付金额',
  9. `subject` string COMMENT '交易内容',
  10. `payment_type` string COMMENT '支付类型',
  11. `payment_time` string COMMENT '支付时间'
  12. ) COMMENT '支付流水表'
  13. PARTITIONED BY (`dt` string)
  14. row format delimited fields terminated by '\t'
  15. location '/wavehouse/gmall/ods/ods_payment_info/';

3.2 导入数据

  1. load data inpath '/origin_data/gmall/db/order_info/2023-01-04' OVERWRITE into table gmall.ods_order_info partition(dt='2023-01-04');
  2. load data inpath '/origin_data/gmall/db/order_info/2023-01-05' OVERWRITE into table gmall.ods_order_info partition(dt='2023-01-05');
  3. load data inpath '/origin_data/gmall/db/order_detail/2023-01-04' OVERWRITE into table gmall.ods_order_detail partition(dt='2023-01-04');
  4. load data inpath '/origin_data/gmall/db/order_detail/2023-01-05' OVERWRITE into table gmall.ods_order_detail partition(dt='2023-01-05');
  5. load data inpath '/origin_data/gmall/db/sku_info/2023-01-04' OVERWRITE into table gmall.ods_sku_info partition(dt='2023-01-04');
  6. load data inpath '/origin_data/gmall/db/sku_info/2023-01-05' OVERWRITE into table gmall.ods_sku_info partition(dt='2023-01-05');
  7. load data inpath '/origin_data/gmall/db/user_info/2023-01-04' OVERWRITE into table gmall.ods_user_info partition(dt='2023-01-04');
  8. load data inpath '/origin_data/gmall/db/user_info/2023-01-05' OVERWRITE into table gmall.ods_user_info partition(dt='2023-01-05');
  9. load data inpath '/origin_data/gmall/db/payment_info/2023-01-04' OVERWRITE into table gmall.ods_payment_info partition(dt='2023-01-04');
  10. load data inpath '/origin_data/gmall/db/payment_info/2023-01-05' OVERWRITE into table gmall.ods_payment_info partition(dt='2023-01-05');
  11. load data inpath '/origin_data/gmall/db/base_category1/2023-01-04' OVERWRITE into table gmall.ods_base_category1 partition(dt='2023-01-04');
  12. load data inpath '/origin_data/gmall/db/base_category1/2023-01-05' OVERWRITE into table gmall.ods_base_category1 partition(dt='2023-01-05');
  13. load data inpath '/origin_data/gmall/db/base_category2/2023-01-04' OVERWRITE into table gmall.ods_base_category2 partition(dt='2023-01-04');
  14. load data inpath '/origin_data/gmall/db/base_category2/2023-01-05' OVERWRITE into table gmall.ods_base_category2 partition(dt='2023-01-05');
  15. load data inpath '/origin_data/gmall/db/base_category3/2023-01-04' OVERWRITE into table gmall.ods_base_category3 partition(dt='2023-01-04');
  16. load data inpath '/origin_data/gmall/db/base_category3/2023-01-05' OVERWRITE into table gmall.ods_base_category3 partition(dt='2023-01-05');

可以将以上写成脚本,以日期为传参参数,每天定时执行即可。

4 DWD层

4.1 创建dwd明细表

4.1.1 创建订单表

  1. drop table if exists dwd_order_info;
  2. create external table dwd_order_info (
  3. `id` string COMMENT '',
  4. `total_amount` decimal(10,2) COMMENT '',
  5. `order_status` string COMMENT ' 1 2 3 4 5',
  6. `user_id` string COMMENT 'id',
  7. `payment_way` string COMMENT '',
  8. `out_trade_no` string COMMENT '',
  9. `create_time` string COMMENT '',
  10. `operate_time` string COMMENT ''
  11. )
  12. PARTITIONED BY (`dt` string)
  13. stored as parquet
  14. location '/wavehouse/gmall/dwd/dwd_order_info/'
  15. tblproperties ("parquet.compression"="snappy");

4.1.2 创建订单详情表

  1. drop table if exists dwd_order_detail;
  2. create external table dwd_order_detail(
  3. `id` string COMMENT '',
  4. `order_id` decimal(10,2) COMMENT '',
  5. `user_id` string COMMENT 'id',
  6. `sku_id` string COMMENT 'id',
  7. `sku_name` string COMMENT '',
  8. `order_price` string COMMENT '',
  9. `sku_num` string COMMENT '',
  10. `create_time` string COMMENT ''
  11. )
  12. PARTITIONED BY (`dt` string)
  13. stored as parquet
  14. location '/wavehouse/gmall/dwd/dwd_order_detail/'
  15. tblproperties ("parquet.compression"="snappy");

4.1.3 创建用户表

  1. drop table if exists dwd_user_info;
  2. create external table dwd_user_info(
  3. `id` string COMMENT 'id',
  4. `name` string COMMENT '',
  5. `birthday` string COMMENT '',
  6. `gender` string COMMENT '',
  7. `email` string COMMENT '',
  8. `user_level` string COMMENT '',
  9. `create_time` string COMMENT ''
  10. )
  11. PARTITIONED BY (`dt` string)
  12. stored as parquet
  13. location '/wavehouse/gmall/dwd/dwd_user_info/'
  14. tblproperties ("parquet.compression"="snappy");

4.1.4 创建支付流水表

  1. drop table if exists dwd_payment_info;
  2. create external table dwd_payment_info(
  3. `id` bigint COMMENT '',
  4. `out_trade_no` string COMMENT '',
  5. `order_id` string COMMENT '',
  6. `user_id` string COMMENT '',
  7. `alipay_trade_no` string COMMENT '',
  8. `total_amount` decimal(16,2) COMMENT '',
  9. `subject` string COMMENT '',
  10. `payment_tpe` string COMMENT '',
  11. `payment_time` string COMMENT ''
  12. )
  13. PARTITIONED BY (`dt` string)
  14. stored as parquet
  15. location '/wavehouse/gmall/dwd/dwd_payment_info/'
  16. tblproperties ("parquet.compression"="snappy");

4.1.5 创建商品分类表

  1. drop table if exists dwd_sku_info;
  2. create external table dwd_sku_info(
  3. `id` string COMMENT 'skuId',
  4. `spu_id` string COMMENT 'spuid',
  5. `price` decimal(10,2) COMMENT '',
  6. `sku_name` string COMMENT '',
  7. `sku_desc` string COMMENT '',
  8. `weight` string COMMENT '',
  9. `tm_id` string COMMENT 'id',
  10. `category3_id` string COMMENT '1id',
  11. `category2_id` string COMMENT '2id',
  12. `category1_id` string COMMENT '3id',
  13. `category3_name` string COMMENT '3',
  14. `category2_name` string COMMENT '2',
  15. `category1_name` string COMMENT '1',
  16. `create_time` string COMMENT ''
  17. )
  18. PARTITIONED BY (`dt` string)
  19. stored as parquet
  20. location '/wavehouse/gmall/dwd/dwd_sku_info/'
  21. tblproperties ("parquet.compression"="snappy");

4.2 导入数据

  1. #!/bin/bash
  2. # 定义变量方便修改
  3. APP=gmall
  4. hive=/root/soft/hive/bin/hive
  5. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  6. if [ -n "$1" ] ;then
  7. do_date=$1
  8. else
  9. do_date=`date -d "-1 day" +%F`
  10. fi
  11. sql="
  12. set hive.exec.dynamic.partition.mode=nonstrict;
  13. insert overwrite table "$APP".dwd_order_info partition(dt)
  14. select * from "$APP".ods_order_info
  15. where dt='$do_date' and id is not null;
  16. insert overwrite table "$APP".dwd_order_detail partition(dt)
  17. select * from "$APP".ods_order_detail
  18. where dt='$do_date' and id is not null;
  19. insert overwrite table "$APP".dwd_user_info partition(dt)
  20. select * from "$APP".ods_user_info
  21. where dt='$do_date' and id is not null;
  22. insert overwrite table "$APP".dwd_payment_info partition(dt)
  23. select * from "$APP".ods_payment_info
  24. where dt='$do_date' and id is not null;
  25. insert overwrite table "$APP".dwd_sku_info partition(dt)
  26. select
  27. sku.id,
  28. sku.spu_id,
  29. sku.price,
  30. sku.sku_name,
  31. sku.sku_desc,
  32. sku.weight,
  33. sku.tm_id,
  34. sku.category3_id,
  35. c2.id category2_id,
  36. c1.id category1_id,
  37. c3.name category3_name,
  38. c2.name category2_name,
  39. c1.name category1_name,
  40. sku.create_time,
  41. sku.dt
  42. from
  43. "$APP".ods_sku_info sku
  44. join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id
  45. join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id
  46. join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id
  47. where sku.dt='$do_date' and c2.dt='$do_date'
  48. and c3.dt='$do_date' and c1.dt='$do_date'
  49. and sku.id is not null;
  50. "
  51. $hive -e "$sql"

在这里插入图片描述

5 dws层

5.1 用户行为宽表

需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析。

5.1.1 创建用户行为宽表

  1. drop table if exists dws_user_action;
  2. create external table dws_user_action
  3. (
  4. user_id string comment '用户 id',
  5. order_count bigint comment '下单次数 ',
  6. order_amount decimal(16,2) comment '下单金额 ',
  7. payment_count bigint comment '支付次数',
  8. payment_amount decimal(16,2) comment '支付金额 ',
  9. comment_count bigint comment '评论次数'
  10. ) COMMENT '每日用户行为宽表'
  11. PARTITIONED BY (`dt` string)
  12. stored as parquet
  13. location '/wavehouse/gmall/dws/dws_user_action/';

5.1.2 导入数据

  1. with
  2. tmp_order as
  3. (
  4. select
  5. user_id,
  6. count(*) order_count,
  7. sum(oi.total_amount) order_amount
  8. from dwd_order_info oi
  9. where date_format(oi.create_time,'yyyy-MM-dd')='2023-01-04'
  10. group by user_id
  11. ) ,
  12. tmp_payment as
  13. (
  14. select
  15. user_id,
  16. sum(pi.total_amount) payment_amount,
  17. count(*) payment_count
  18. from dwd_payment_info pi
  19. where date_format(pi.payment_time,'yyyy-MM-dd')='2023-01-04'
  20. group by user_id
  21. ),
  22. tmp_comment as
  23. (
  24. select
  25. user_id,
  26. count(*) comment_count
  27. from dwd_comment_log c
  28. where date_format(c.dt,'yyyy-MM-dd')='2023-01-04'
  29. group by user_id
  30. )
  31. insert overwrite table dws_user_action partition(dt='2023-01-04')
  32. select
  33. user_actions.user_id,
  34. sum(user_actions.order_count),
  35. sum(user_actions.order_amount),
  36. sum(user_actions.payment_count),
  37. sum(user_actions.payment_amount),
  38. sum(user_actions.comment_count)
  39. from
  40. (
  41. select
  42. user_id,
  43. order_count,
  44. order_amount,
  45. 0 payment_count,
  46. 0 payment_amount,
  47. 0 comment_count
  48. from tmp_order
  49. union all
  50. select
  51. user_id,
  52. 0,
  53. 0,
  54. payment_count,
  55. payment_amount,
  56. 0
  57. from tmp_payment
  58. union all
  59. select
  60. user_id,
  61. 0,
  62. 0,
  63. 0,
  64. 0,
  65. comment_count
  66. from tmp_comment
  67. ) user_actions
  68. group by user_id;

6 需求

6.1 需求1

求GMV成交总额。GMV是指一定时间内的成交总额(如一天、一周、一个月)
建表

  1. drop table if exists ads_gmv_sum_day;
  2. create external table ads_gmv_sum_day(
  3. `dt` string COMMENT '统计日期',
  4. `gmv_count` bigint COMMENT '当日gmv订单个数',
  5. `gmv_amount` decimal(16,2) COMMENT '当日gmv订单总金额',
  6. `gmv_payment` decimal(16,2) COMMENT '当日支付金额'
  7. ) COMMENT 'GMV'
  8. row format delimited fields terminated by '\t'
  9. location '/wavehouse/gmall/ads/ads_gmv_sum_day/';

插入数据

  1. INSERT INTO TABLE ads_gmv_sum_day
  2. SELECT
  3. '2023-01-04' dt,
  4. sum(order_count) gmv_count,
  5. sum(order_amount) gmv_amount,
  6. sum(payment_amount) gmv_payment
  7. FROM
  8. dws_user_action
  9. WHERE dt='2023-01-04'
  10. GROUP BY dt;

6.2 需求2

求转换率之用户新鲜度及漏斗分析
在这里插入图片描述

6.2.1 ADS层之新增用户占日活跃用户比率(用户新鲜度)

建表

  1. drop table if exists ads_user_convert_day;
  2. create external table ads_user_convert_day(
  3. `dt` string COMMENT '统计日期',
  4. `uv_m_count` bigint COMMENT '当日活跃设备',
  5. `new_m_count` bigint COMMENT '当日新增设备',
  6. `new_m_ratio` decimal(10,2) COMMENT '当日新增占日活的比率'
  7. ) COMMENT '转化率'
  8. row format delimited fields terminated by '\t'
  9. location '/wavehouse/gmall/ads/ads_user_convert_day/';

数据导入

  1. insert into table ads_user_convert_day
  2. select
  3. '2023-01-04' dt,
  4. sum(uc.dc) sum_dc,
  5. sum(uc.nmc) sum_nmc,
  6. cast(sum( uc.nmc)/sum( uc.dc)*100 as decimal(10,2)) new_m_ratio
  7. from
  8. (
  9. select
  10. day_count dc,
  11. 0 nmc
  12. from ads_uv_count
  13. where dt='2023-01-04'
  14. union all
  15. select
  16. 0 dc,
  17. new_mid_count nmc
  18. from ads_new_mid_count
  19. where create_date='2023-01-04'
  20. )uc;

在这里插入图片描述

6.2.2 ADS层之用户行为漏斗分析

在这里插入图片描述
创建表

  1. drop table if exists ads_user_action_convert_day;
  2. create external table ads_user_action_convert_day(
  3. `dt` string COMMENT '统计日期',
  4. `total_visitor_m_count` bigint COMMENT '总访问人数',
  5. `order_u_count` bigint COMMENT '下单人数',
  6. `visitor2order_convert_ratio` decimal(10,2) COMMENT '访问到下单转化率',
  7. `payment_u_count` bigint COMMENT '支付人数',
  8. `order2payment_convert_ratio` decimal(10,2) COMMENT '下单到支付的转化率'
  9. ) COMMENT '用户行为漏斗分析'
  10. row format delimited fields terminated by '\t'
  11. location '/wavehouse/gmall/ads/ads_user_action_convert_day/';

插入数据

  1. insert into table ads_user_action_convert_day
  2. select
  3. '2023-01-04',
  4. uv.day_count,
  5. ua.order_count,
  6. cast(ua.order_count/uv.day_count as decimal(10,2)) visitor2order_convert_ratio,
  7. ua.payment_count,
  8. cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio
  9. from
  10. (
  11. select
  12. dt,
  13. sum(if(order_count>0,1,0)) order_count,
  14. sum(if(payment_count>0,1,0)) payment_count
  15. from dws_user_action
  16. where dt='2023-01-04'
  17. group by dt
  18. )ua join ads_uv_count uv on uv.dt=ua.dt;

在这里插入图片描述

6.3 需求3

品牌复购率
需求:以月为单位统计,购买2次以上商品的用户

6.3.1 DWS层建表

  1. drop table if exists dws_sale_detail_daycount;
  2. create external table dws_sale_detail_daycount
  3. (
  4. user_id string comment '用户 id',
  5. sku_id string comment '商品 Id',
  6. user_gender string comment '用户性别',
  7. user_age string comment '用户年龄',
  8. user_level string comment '用户等级',
  9. order_price decimal(10,2) comment '商品价格',
  10. sku_name string comment '商品名称',
  11. sku_tm_id string comment '品牌id',
  12. sku_category3_id string comment '商品三级品类id',
  13. sku_category2_id string comment '商品二级品类id',
  14. sku_category1_id string comment '商品一级品类id',
  15. sku_category3_name string comment '商品三级品类名称',
  16. sku_category2_name string comment '商品二级品类名称',
  17. sku_category1_name string comment '商品一级品类名称',
  18. spu_id string comment '商品 spu',
  19. sku_num int comment '购买个数',
  20. order_count string comment '当日下单单数',
  21. order_amount string comment '当日下单金额'
  22. ) COMMENT '用户购买商品明细表'
  23. PARTITIONED BY (`dt` string)
  24. stored as parquet
  25. location '/wavehouse/gmall/dws/dws_user_sale_detail_daycount/'
  26. tblproperties ("parquet.compression"="snappy");

数据导入

  1. with
  2. tmp_detail as
  3. (
  4. select
  5. user_id,
  6. sku_id,
  7. sum(sku_num) sku_num,
  8. count(*) order_count,
  9. sum(od.order_price*sku_num) order_amount
  10. from dwd_order_detail od
  11. where od.dt='2023-01-05'
  12. group by user_id, sku_id
  13. )
  14. insert overwrite table dws_sale_detail_daycount partition(dt='2023-01-05')
  15. select
  16. tmp_detail.user_id,
  17. tmp_detail.sku_id,
  18. u.gender,
  19. months_between('2023-01-05', u.birthday)/12 age,
  20. u.user_level,
  21. price,
  22. sku_name,
  23. tm_id,
  24. category3_id,
  25. category2_id,
  26. category1_id,
  27. category3_name,
  28. category2_name,
  29. category1_name,
  30. spu_id,
  31. tmp_detail.sku_num,
  32. tmp_detail.order_count,
  33. tmp_detail.order_amount
  34. from tmp_detail
  35. left join dwd_user_info u on tmp_detail.user_id =u.id and u.dt='2023-01-05'
  36. left join dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt='2023-01-05';

6.3.2 ods层

建表

  1. drop table ads_sale_tm_category1_stat_mn;
  2. create external table ads_sale_tm_category1_stat_mn
  3. (
  4. tm_id string comment '品牌id',
  5. category1_id string comment '1级品类id ',
  6. category1_name string comment '1级品类名称 ',
  7. buycount bigint comment '购买人数',
  8. buy_twice_last bigint comment '两次以上购买人数',
  9. buy_twice_last_ratio decimal(10,2) comment '单次复购率',
  10. buy_3times_last bigint comment '三次以上购买人数',
  11. buy_3times_last_ratio decimal(10,2) comment '多次复购率',
  12. stat_mn string comment '统计月份',
  13. stat_date string comment '统计日期'
  14. ) COMMENT '复购率统计'
  15. row format delimited fields terminated by '\t'
  16. location '/wavehouse/gmall/ads/ads_sale_tm_category1_stat_mn/';

插入数据

  1. insert into table ads_sale_tm_category1_stat_mn
  2. select
  3. mn.sku_tm_id,
  4. mn.sku_category1_id,
  5. mn.sku_category1_name,
  6. sum(if(mn.order_count>=1,1,0)) buycount,
  7. sum(if(mn.order_count>=2,1,0)) buyTwiceLast,
  8. sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,
  9. sum(if(mn.order_count>=3,1,0)) buy3timeLast ,
  10. sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,
  11. date_format('2023-01-04' ,'yyyy-MM') stat_mn,
  12. '2023-01-04' stat_date
  13. from
  14. (
  15. select
  16. user_id,
  17. sd.sku_tm_id,
  18. sd.sku_category1_id,
  19. sd.sku_category1_name,
  20. sum(order_count) order_count
  21. from dws_sale_detail_daycount sd
  22. where date_format(dt,'yyyy-MM')=date_format('2023-01-04' ,'yyyy-MM')
  23. group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name
  24. ) mn
  25. group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;

在这里插入图片描述

6.4 需求4

各用户等级对应的复购率前十的商品排行
建表

  1. drop table ads_ul_rep_ratio;
  2. create table ads_ul_rep_ratio(
  3. user_level string comment '用户等级' ,
  4. sku_id string comment '商品id',
  5. buy_count bigint comment '购买总人数',
  6. buy_twice_count bigint comment '两次购买总数',
  7. buy_twice_rate decimal(10,2) comment '二次复购率',
  8. rank string comment '排名' ,
  9. state_date string comment '统计日期'
  10. ) COMMENT '复购率统计'
  11. row format delimited fields terminated by '\t'
  12. location '/wavehouse/gmall/ads/ads_ul_rep_ratio/';

插入数据

  1. with
  2. tmp_count as(
  3. select -- 每个等级内每个用户对每个产品的下单次数
  4. user_level,
  5. user_id,
  6. sku_id,
  7. sum(order_count) order_count
  8. from dws_sale_detail_daycount
  9. where dt<='2023-01-04'
  10. group by user_level, user_id, sku_id
  11. )
  12. insert overwrite table ads_ul_rep_ratio
  13. select
  14. *
  15. from(
  16. select
  17. user_level,
  18. sku_id,
  19. sum(if(order_count >=1, 1, 0)) buy_count,
  20. sum(if(order_count >=2, 1, 0)) buy_twice_count,
  21. sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) * 100 buy_twice_rate,
  22. row_number() over(partition by user_level order by sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) desc) rn,
  23. '2023-01-04'
  24. from tmp_count
  25. group by user_level, sku_id
  26. ) t1
  27. where rn<=10

接下来是本地数仓项目数据可视化和任务调度,详见《本地数仓项目(三)—— 数据可视化和任务调度》

发表评论

表情:
评论列表 (有 0 条评论,110人围观)

还没有评论,来说两句吧...

相关阅读