离线数仓 (十六) --------- ADS 层搭建

你的名字 2024-04-01 11:54 141阅读 0赞

目录

  • 一、建表说明
  • 二、访客主题
      1. 访客统计
      1. 路径分析
  • 三、用户主题
      1. 用户统计
      1. 用户变动统计
      1. 用户行为漏斗分析
      1. 用户留存率
  • 四、商品主题
      1. 商品统计
      1. 品牌复购率
  • 五、订单主题
      1. 订单统计
      1. 各地区订单统计
  • 六、优惠券主题
      1. 优惠券统计
  • 七、活动主题
      1. 活动统计
  • 八、ADS 层业务数据导入脚本

一、建表说明

ADS 层不涉及建模,建表根据具体需求而定。

二、访客主题

1. 访客统计

该需求为访客综合统计,其中包含若干指标,以下为对每个指标的解释说明。



















































指标 说明 对应字段
访客数 统计访问人数 uv_count
页面停留时长 统计所有页面访问记录总时长,以秒为单位 duration_sec
平均页面停留时长 统计每个会话平均停留时长,以秒为单位 avg_duration_sec
页面浏览总数 统计所有页面访问记录总数 page_count
平均页面浏览数 统计每个会话平均浏览页面数 avg_page_count
会话总数 统计会话总数 sv_count
跳出数 统计只浏览一个页面的会话个数 bounce_count
跳出率 只有一个页面的会话的比例 bounce_rate

A、建表语句

  1. DROP TABLE IF EXISTS ads_visit_stats;
  2. CREATE EXTERNAL TABLE ads_visit_stats (
  3. `dt` STRING COMMENT '统计日期',
  4. `is_new` STRING COMMENT '新老标识,1:新,0:老',
  5. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  6. `channel` STRING COMMENT '渠道',
  7. `uv_count` BIGINT COMMENT '日活(访问人数)',
  8. `duration_sec` BIGINT COMMENT '页面停留总时长',
  9. `avg_duration_sec` BIGINT COMMENT '一次会话,页面停留平均时长,单位为描述',
  10. `page_count` BIGINT COMMENT '页面总浏览数',
  11. `avg_page_count` BIGINT COMMENT '一次会话,页面平均浏览数',
  12. `sv_count` BIGINT COMMENT '会话次数',
  13. `bounce_count` BIGINT COMMENT '跳出数',
  14. `bounce_rate` DECIMAL(16,2) COMMENT '跳出率'
  15. ) COMMENT '访客统计'
  16. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  17. LOCATION '/warehouse/gmall/ads/ads_visit_stats/';

B、数据装载

思路分析:该需求的关键点为会话的划分,总体实现思路可分为以下几步:

  • 第一步:对所有页面访问记录进行会话的划分。
  • 第二步:统计每个会话的浏览时长和浏览页面数。
  • 第三步:统计上述各指标。

    insert overwrite table ads_visit_stats
    select * from ads_visit_stats
    union
    select

    1. '2020-06-14' dt,
    2. is_new,
    3. recent_days,
    4. channel,
    5. count(distinct(mid_id)) uv_count,
    6. cast(sum(duration)/1000 as bigint) duration_sec,
    7. cast(avg(duration)/1000 as bigint) avg_duration_sec,
    8. sum(page_count) page_count,
    9. cast(avg(page_count) as bigint) avg_page_count,
    10. count(*) sv_count,
    11. sum(if(page_count=1,1,0)) bounce_count,
    12. cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate

    from
    (

    1. select
    2. session_id,
    3. mid_id,
    4. is_new,
    5. recent_days,
    6. channel,
    7. count(*) page_count,
    8. sum(during_time) duration
    9. from
    10. (
    11. select
    12. mid_id,
    13. channel,
    14. recent_days,
    15. is_new,
    16. last_page_id,
    17. page_id,
    18. during_time,
    19. concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id
    20. from
    21. (
    22. select
    23. mid_id,
    24. channel,
    25. last_page_id,
    26. page_id,
    27. during_time,
    28. ts,
    29. recent_days,
    30. if(visit_date_first>=date_add('2020-06-14',-recent_days+1),'1','0') is_new
    31. from
    32. (
    33. select
    34. t1.mid_id,
    35. t1.channel,
    36. t1.last_page_id,
    37. t1.page_id,
    38. t1.during_time,
    39. t1.dt,
    40. t1.ts,
    41. t2.visit_date_first
    42. from
    43. (
    44. select
    45. mid_id,
    46. channel,
    47. last_page_id,
    48. page_id,
    49. during_time,
    50. dt,
    51. ts
    52. from dwd_page_log
    53. where dt>=date_add('2020-06-14',-30)
    54. )t1
    55. left join
    56. (
    57. select
    58. mid_id,
    59. visit_date_first
    60. from dwt_visitor_topic
    61. where dt='2020-06-14'
    62. )t2
    63. on t1.mid_id=t2.mid_id
    64. )t3 lateral view explode(Array(1,7,30)) tmp as recent_days
    65. where dt>=date_add('2020-06-14',-recent_days+1)
    66. )t4
    67. )t5
    68. group by session_id,mid_id,is_new,recent_days,channel

    )t6
    group by is_new,recent_days,channel;

2. 路径分析

用户路径分析,顾名思义,就是指用户在 APP 或网站中的访问路径。为了衡量网站优化的效果或营销推广的效果,以及了解用户行为偏好,时常要对访问路径进行分析。

用户访问路径的可视化通常使用桑基图。如下图所示,该图可真实还原用户的访问路径,包括页面跳转和页面访问次序。

桑基图需要我们提供每种页面跳转的次数,每个跳转由 source/target 表示,source 指跳转起始页面,target 表示跳转终到页面。

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS ads_page_path;
  2. CREATE EXTERNAL TABLE ads_page_path
  3. (
  4. `dt` STRING COMMENT '统计日期',
  5. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  6. `source` STRING COMMENT '跳转起始页面ID',
  7. `target` STRING COMMENT '跳转终到页面ID',
  8. `path_count` BIGINT COMMENT '跳转次数'
  9. ) COMMENT '页面浏览路径'
  10. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  11. LOCATION '/warehouse/gmall/ads/ads_page_path/';

B、数据装载

思路分析:该需求要统计的就是每种跳转的次数,故理论上对 source/target 进行分组 count() 即可。统计时需注意以下两点:

  • 第一点:桑基图的 source 不允许为空,但 target 可为空。
  • 第二点:桑基图所展示的流程不允许存在环。

    insert overwrite table ads_page_path
    select * from ads_page_path
    union
    select

    1. '2020-06-14',
    2. recent_days,
    3. source,
    4. target,
    5. count(*)

    from
    (

    1. select
    2. recent_days,
    3. concat('step-',step,':',source) source,
    4. concat('step-',step+1,':',target) target
    5. from
    6. (
    7. select
    8. recent_days,
    9. page_id source,
    10. lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,
    11. row_number() over (partition by recent_days,session_id order by ts) step
    12. from
    13. (
    14. select
    15. recent_days,
    16. last_page_id,
    17. page_id,
    18. ts,
    19. concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id
    20. from dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days
    21. where dt>=date_add('2020-06-14',-30)
    22. and dt>=date_add('2020-06-14',-recent_days+1)
    23. )t2
    24. )t3

    )t4
    group by recent_days,source,target;

三、用户主题

1. 用户统计

该需求为用户综合统计,其中包含若干指标,以下为对每个指标的解释说明。




































指标 说明 对应字段
新增用户数 统计新增注册用户人数 new_user_count
新增下单用户数 统计新增下单用户人数 new_order_user_count
下单总金额 统计所有订单总额 order_final_amount
下单用户数 统计下单用户总数 order_user_count
未下单用户数 统计活跃但未下单用户数 no_order_user_count

A、建表语句

  1. DROP TABLE IF EXISTS ads_user_total;
  2. CREATE EXTERNAL TABLE `ads_user_total` (
  3. `dt` STRING COMMENT '统计日期',
  4. `recent_days` BIGINT COMMENT '最近天数,0:累积值,1:最近1天,7:最近7天,30:最近30天',
  5. `new_user_count` BIGINT COMMENT '新注册用户数',
  6. `new_order_user_count` BIGINT COMMENT '新增下单用户数',
  7. `order_final_amount` DECIMAL(16,2) COMMENT '下单总金额',
  8. `order_user_count` BIGINT COMMENT '下单用户数',
  9. `no_order_user_count` BIGINT COMMENT '未下单用户数(具体指活跃用户中未下单用户)'
  10. ) COMMENT '用户统计'
  11. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  12. LOCATION '/warehouse/gmall/ads/ads_user_total/';

B、数据装载

  1. insert overwrite table ads_user_total
  2. select * from ads_user_total
  3. union
  4. select
  5. '2020-06-14',
  6. recent_days,
  7. sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,
  8. sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,
  9. sum(order_final_amount) order_final_amount,
  10. sum(if(order_final_amount>0,1,0)) order_user_count,
  11. sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count
  12. from
  13. (
  14. select
  15. recent_days,
  16. user_id,
  17. login_date_first,
  18. login_date_last,
  19. order_date_first,
  20. case when recent_days=0 then order_final_amount
  21. when recent_days=1 then order_last_1d_final_amount
  22. when recent_days=7 then order_last_7d_final_amount
  23. when recent_days=30 then order_last_30d_final_amount
  24. end order_final_amount,
  25. if(recent_days=0,'1970-01-01',date_add('2020-06-14',-recent_days+1)) recent_days_ago
  26. from dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days
  27. where dt='2020-06-14'
  28. )t1
  29. group by recent_days;

2. 用户变动统计

该需求包括两个指标,分别为流失用户数和回流用户数,以下为对两个指标的解释说明。





















指标 说明 对应字段
流失用户数 之前活跃过的用户,最近一段时间未活跃,就称为流失用户。此处要求统计7日前 (只包含7日前当天) 活跃,但最近7日未活跃的用户总数。 user_churn_count
回流用户数 之前的活跃用户,一段时间未活跃 (流失),今日又活跃了,就称为回流用户。此处要求统计回流用户总数。 new_order_user_count

A、建表语句

  1. DROP TABLE IF EXISTS ads_user_change;
  2. CREATE EXTERNAL TABLE `ads_user_change` (
  3. `dt` STRING COMMENT '统计日期',
  4. `user_churn_count` BIGINT COMMENT '流失用户数',
  5. `user_back_count` BIGINT COMMENT '回流用户数'
  6. ) COMMENT '用户变动统计'
  7. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  8. LOCATION '/warehouse/gmall/ads/ads_user_change/';

B、数据装载

思路分析:

  • 流失用户:末次活跃时间为7日前的用户即为流失用户。
  • 回流用户:末次活跃时间为今日,上次活跃时间在8日前的用户即为回流用户。

    insert overwrite table ads_user_change
    select * from ads_user_change
    union
    select

    1. churn.dt,
    2. user_churn_count,
    3. user_back_count

    from
    (

    1. select
    2. '2020-06-14' dt,
    3. count(*) user_churn_count
    4. from dwt_user_topic
    5. where dt='2020-06-14'
    6. and login_date_last=date_add('2020-06-14',-7)

    )churn
    join
    (

    1. select
    2. '2020-06-14' dt,
    3. count(*) user_back_count
    4. from
    5. (
    6. select
    7. user_id,
    8. login_date_last
    9. from dwt_user_topic
    10. where dt='2020-06-14'
    11. and login_date_last='2020-06-14'
    12. )t1
    13. join
    14. (
    15. select
    16. user_id,
    17. login_date_last login_date_previous
    18. from dwt_user_topic
    19. where dt=date_add('2020-06-14',-1)
    20. )t2
    21. on t1.user_id=t2.user_id
    22. where datediff(login_date_last,login_date_previous)>=8

    )back
    on churn.dt=back.dt;

3. 用户行为漏斗分析

漏斗分析是一个数据分析模型,它能够科学反映一个业务过程从起点到终点各阶段用户转化情况。由于其能将各阶段环节都展示出来,故哪个阶段存在问题,就能一目了然。

在这里插入图片描述

该需求要求统计一个完整的购物流程各个阶段的人数。

A、建表语句

  1. DROP TABLE IF EXISTS ads_user_action;
  2. CREATE EXTERNAL TABLE `ads_user_action` (
  3. `dt` STRING COMMENT '统计日期',
  4. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  5. `home_count` BIGINT COMMENT '浏览首页人数',
  6. `good_detail_count` BIGINT COMMENT '浏览商品详情页人数',
  7. `cart_count` BIGINT COMMENT '加入购物车人数',
  8. `order_count` BIGINT COMMENT '下单人数',
  9. `payment_count` BIGINT COMMENT '支付人数'
  10. ) COMMENT '漏斗分析'
  11. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  12. LOCATION '/warehouse/gmall/ads/ads_user_action/';

B、数据装载

  1. with
  2. tmp_page as
  3. (
  4. select
  5. '2020-06-14' dt,
  6. recent_days,
  7. sum(if(array_contains(pages,'home'),1,0)) home_count,
  8. sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
  9. from
  10. (
  11. select
  12. recent_days,
  13. mid_id,
  14. collect_set(page_id) pages
  15. from
  16. (
  17. select
  18. dt,
  19. mid_id,
  20. page.page_id
  21. from dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
  22. where dt>=date_add('2020-06-14',-29)
  23. and page.page_id in('home','good_detail')
  24. )t1 lateral view explode(Array(1,7,30)) tmp as recent_days
  25. where dt>=date_add('2020-06-14',-recent_days+1)
  26. group by recent_days,mid_id
  27. )t2
  28. group by recent_days
  29. ),
  30. tmp_cop as
  31. (
  32. select
  33. '2020-06-14' dt,
  34. recent_days,
  35. sum(if(cart_count>0,1,0)) cart_count,
  36. sum(if(order_count>0,1,0)) order_count,
  37. sum(if(payment_count>0,1,0)) payment_count
  38. from
  39. (
  40. select
  41. recent_days,
  42. user_id,
  43. case
  44. when recent_days=1 then cart_last_1d_count
  45. when recent_days=7 then cart_last_7d_count
  46. when recent_days=30 then cart_last_30d_count
  47. end cart_count,
  48. case
  49. when recent_days=1 then order_last_1d_count
  50. when recent_days=7 then order_last_7d_count
  51. when recent_days=30 then order_last_30d_count
  52. end order_count,
  53. case
  54. when recent_days=1 then payment_last_1d_count
  55. when recent_days=7 then payment_last_7d_count
  56. when recent_days=30 then payment_last_30d_count
  57. end payment_count
  58. from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  59. where dt='2020-06-14'
  60. )t1
  61. group by recent_days
  62. )
  63. insert overwrite table ads_user_action
  64. select * from ads_user_action
  65. union
  66. select
  67. tmp_page.dt,
  68. tmp_page.recent_days,
  69. home_count,
  70. good_detail_count,
  71. cart_count,
  72. order_count,
  73. payment_count
  74. from tmp_page
  75. join tmp_cop
  76. on tmp_page.recent_days=tmp_cop.recent_days;

4. 用户留存率

留存分析一般包含新增留存和活跃留存分析。

新增留存分析是分析某天的新增用户中,有多少人有后续的活跃行为。活跃留存分析是分析某天的活跃用户中,有多少人有后续的活跃行为。

留存分析是衡量产品对用户价值高低的重要指标。

此处要求统计新增留存率,新增留存率具体是指留存用户数与新增用户数的比值,例如2020-06-14新增100个用户,1日之后(2020-06-15)这100人中有80个人活跃了,那2020-06-14的1日留存数则为80,2020-06-14的1日留存率则为80%。

要求统计每天的1至7日留存率,如下图所示。

在这里插入图片描述

A、建表语句

  1. DROP TABLE IF EXISTS ads_user_retention;
  2. CREATE EXTERNAL TABLE ads_user_retention (
  3. `dt` STRING COMMENT '统计日期',
  4. `create_date` STRING COMMENT '用户新增日期',
  5. `retention_day` BIGINT COMMENT '截至当前日期留存天数',
  6. `retention_count` BIGINT COMMENT '留存用户数量',
  7. `new_user_count` BIGINT COMMENT '新增用户数量',
  8. `retention_rate` DECIMAL(16,2) COMMENT '留存率'
  9. ) COMMENT '用户留存率'
  10. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  11. LOCATION '/warehouse/gmall/ads/ads_user_retention/';

B、数据装载

  1. insert overwrite table ads_user_retention
  2. select * from ads_user_retention
  3. union
  4. select
  5. '2020-06-14',
  6. login_date_first create_date,
  7. datediff('2020-06-14',login_date_first) retention_day,
  8. sum(if(login_date_last='2020-06-14',1,0)) retention_count,
  9. count(*) new_user_count,
  10. cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate
  11. from dwt_user_topic
  12. where dt='2020-06-14'
  13. and login_date_first>=date_add('2020-06-14',-7)
  14. and login_date_first<'2020-06-14'
  15. group by login_date_first;

四、商品主题

1. 商品统计

该指标为商品综合统计,包含每个spu被下单总次数和被下单总金额。

A、建表语句

  1. DROP TABLE IF EXISTS ads_order_spu_stats;
  2. CREATE EXTERNAL TABLE `ads_order_spu_stats` (
  3. `dt` STRING COMMENT '统计日期',
  4. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  5. `spu_id` STRING COMMENT '商品ID',
  6. `spu_name` STRING COMMENT '商品名称',
  7. `tm_id` STRING COMMENT '品牌ID',
  8. `tm_name` STRING COMMENT '品牌名称',
  9. `category3_id` STRING COMMENT '三级品类ID',
  10. `category3_name` STRING COMMENT '三级品类名称',
  11. `category2_id` STRING COMMENT '二级品类ID',
  12. `category2_name` STRING COMMENT '二级品类名称',
  13. `category1_id` STRING COMMENT '一级品类ID',
  14. `category1_name` STRING COMMENT '一级品类名称',
  15. `order_count` BIGINT COMMENT '订单数',
  16. `order_amount` DECIMAL(16,2) COMMENT '订单金额'
  17. ) COMMENT '商品销售统计'
  18. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  19. LOCATION '/warehouse/gmall/ads/ads_order_spu_stats/';

B、数据装载

  1. insert overwrite table ads_order_spu_stats
  2. select * from ads_order_spu_stats
  3. union
  4. select
  5. '2020-06-14' dt,
  6. recent_days,
  7. spu_id,
  8. spu_name,
  9. tm_id,
  10. tm_name,
  11. category3_id,
  12. category3_name,
  13. category2_id,
  14. category2_name,
  15. category1_id,
  16. category1_name,
  17. sum(order_count),
  18. sum(order_amount)
  19. from
  20. (
  21. select
  22. recent_days,
  23. sku_id,
  24. case
  25. when recent_days=1 then order_last_1d_count
  26. when recent_days=7 then order_last_7d_count
  27. when recent_days=30 then order_last_30d_count
  28. end order_count,
  29. case
  30. when recent_days=1 then order_last_1d_final_amount
  31. when recent_days=7 then order_last_7d_final_amount
  32. when recent_days=30 then order_last_30d_final_amount
  33. end order_amount
  34. from dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  35. where dt='2020-06-14'
  36. )t1
  37. left join
  38. (
  39. select
  40. id,
  41. spu_id,
  42. spu_name,
  43. tm_id,
  44. tm_name,
  45. category3_id,
  46. category3_name,
  47. category2_id,
  48. category2_name,
  49. category1_id,
  50. category1_name
  51. from dim_sku_info
  52. where dt='2020-06-14'
  53. )t2
  54. on t1.sku_id=t2.id
  55. group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;

2. 品牌复购率

品牌复购率是指一段时间内重复购买某品牌的人数与购买过该品牌的人数的比值。重复购买即购买次数大于等于2,购买过即购买次数大于1。

此处要求统计最近1,7,30天的各品牌复购率。

A、建表语句

  1. DROP TABLE IF EXISTS ads_repeat_purchase;
  2. CREATE EXTERNAL TABLE `ads_repeat_purchase` (
  3. `dt` STRING COMMENT '统计日期',
  4. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  5. `tm_id` STRING COMMENT '品牌ID',
  6. `tm_name` STRING COMMENT '品牌名称',
  7. `order_repeat_rate` DECIMAL(16,2) COMMENT '复购率'
  8. ) COMMENT '品牌复购率'
  9. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  10. LOCATION '/warehouse/gmall/ads/ads_repeat_purchase/';

B、数据装载

思路分析:该需求可分两步实现:

  • 第一步:统计每个用户购买每个品牌的次数。
  • 第二步:分别统计购买次数大于1的人数和大于2的人数。

    insert overwrite table ads_repeat_purchase
    select * from ads_repeat_purchase
    union
    select

    1. '2020-06-14' dt,
    2. recent_days,
    3. tm_id,
    4. tm_name,
    5. cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))

    from
    (

    1. select
    2. recent_days,
    3. user_id,
    4. tm_id,
    5. tm_name,
    6. sum(order_count) order_count
    7. from
    8. (
    9. select
    10. recent_days,
    11. user_id,
    12. sku_id,
    13. count(*) order_count
    14. from dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days
    15. where dt>=date_add('2020-06-14',-29)
    16. and dt>=date_add('2020-06-14',-recent_days+1)
    17. group by recent_days, user_id,sku_id
    18. )t1
    19. left join
    20. (
    21. select
    22. id,
    23. tm_id,
    24. tm_name
    25. from dim_sku_info
    26. where dt='2020-06-14'
    27. )t2
    28. on t1.sku_id=t2.id
    29. group by recent_days,user_id,tm_id,tm_name

    )t3
    group by recent_days,tm_id,tm_name;

五、订单主题

1. 订单统计

该需求包含订单总数,订单总金额和下单总人数。

A、建表语句

  1. DROP TABLE IF EXISTS ads_order_total;
  2. CREATE EXTERNAL TABLE `ads_order_total` (
  3. `dt` STRING COMMENT '统计日期',
  4. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  5. `order_count` BIGINT COMMENT '订单数',
  6. `order_amount` DECIMAL(16,2) COMMENT '订单金额',
  7. `order_user_count` BIGINT COMMENT '下单人数'
  8. ) COMMENT '订单统计'
  9. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  10. LOCATION '/warehouse/gmall/ads/ads_order_total/';

B、数据装载

  1. insert overwrite table ads_order_total
  2. select * from ads_order_total
  3. union
  4. select
  5. '2020-06-14',
  6. recent_days,
  7. sum(order_count),
  8. sum(order_final_amount) order_final_amount,
  9. sum(if(order_final_amount>0,1,0)) order_user_count
  10. from
  11. (
  12. select
  13. recent_days,
  14. user_id,
  15. case when recent_days=0 then order_count
  16. when recent_days=1 then order_last_1d_count
  17. when recent_days=7 then order_last_7d_count
  18. when recent_days=30 then order_last_30d_count
  19. end order_count,
  20. case when recent_days=0 then order_final_amount
  21. when recent_days=1 then order_last_1d_final_amount
  22. when recent_days=7 then order_last_7d_final_amount
  23. when recent_days=30 then order_last_30d_final_amount
  24. end order_final_amount
  25. from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  26. where dt='2020-06-14'
  27. )t1
  28. group by recent_days;

2. 各地区订单统计

该需求包含各省份订单总数和订单总金额。

A、建表语句

  1. DROP TABLE IF EXISTS ads_order_by_province;
  2. CREATE EXTERNAL TABLE `ads_order_by_province` (
  3. `dt` STRING COMMENT '统计日期',
  4. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  5. `province_id` STRING COMMENT '省份ID',
  6. `province_name` STRING COMMENT '省份名称',
  7. `area_code` STRING COMMENT '地区编码',
  8. `iso_code` STRING COMMENT '国际标准地区编码',
  9. `iso_code_3166_2` STRING COMMENT '国际标准地区编码',
  10. `order_count` BIGINT COMMENT '订单数',
  11. `order_amount` DECIMAL(16,2) COMMENT '订单金额'
  12. ) COMMENT '各地区订单统计'
  13. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  14. LOCATION '/warehouse/gmall/ads/ads_order_by_province/';

B、数据装载

  1. insert overwrite table ads_order_by_province
  2. select * from ads_order_by_province
  3. union
  4. select
  5. dt,
  6. recent_days,
  7. province_id,
  8. province_name,
  9. area_code,
  10. iso_code,
  11. iso_3166_2,
  12. order_count,
  13. order_amount
  14. from
  15. (
  16. select
  17. '2020-06-14' dt,
  18. recent_days,
  19. province_id,
  20. sum(order_count) order_count,
  21. sum(order_amount) order_amount
  22. from
  23. (
  24. select
  25. recent_days,
  26. province_id,
  27. case
  28. when recent_days=1 then order_last_1d_count
  29. when recent_days=7 then order_last_7d_count
  30. when recent_days=30 then order_last_30d_count
  31. end order_count,
  32. case
  33. when recent_days=1 then order_last_1d_final_amount
  34. when recent_days=7 then order_last_7d_final_amount
  35. when recent_days=30 then order_last_30d_final_amount
  36. end order_amount
  37. from dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  38. where dt='2020-06-14'
  39. )t1
  40. group by recent_days,province_id
  41. )t2
  42. join dim_base_province t3
  43. on t2.province_id=t3.id;

六、优惠券主题

1. 优惠券统计

该需求要求统计最近30日发布的所有优惠券的领用情况和补贴率,补贴率是指,优惠金额与使用优惠券的订单的原价金额的比值。

A、建表语句

  1. DROP TABLE IF EXISTS ads_coupon_stats;
  2. CREATE EXTERNAL TABLE ads_coupon_stats (
  3. `dt` STRING COMMENT '统计日期',
  4. `coupon_id` STRING COMMENT '优惠券ID',
  5. `coupon_name` STRING COMMENT '优惠券名称',
  6. `start_date` STRING COMMENT '发布日期',
  7. `rule_name` STRING COMMENT '优惠规则,例如满100元减10元',
  8. `get_count` BIGINT COMMENT '领取次数',
  9. `order_count` BIGINT COMMENT '使用(下单)次数',
  10. `expire_count` BIGINT COMMENT '过期次数',
  11. `order_original_amount` DECIMAL(16,2) COMMENT '使用优惠券订单原始金额',
  12. `order_final_amount` DECIMAL(16,2) COMMENT '使用优惠券订单最终金额',
  13. `reduce_amount` DECIMAL(16,2) COMMENT '优惠金额',
  14. `reduce_rate` DECIMAL(16,2) COMMENT '补贴率'
  15. ) COMMENT '商品销售统计'
  16. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  17. LOCATION '/warehouse/gmall/ads/ads_coupon_stats/';

B、数据装载

  1. insert overwrite table ads_coupon_stats
  2. select * from ads_coupon_stats
  3. union
  4. select
  5. '2020-06-14' dt,
  6. t1.id,
  7. coupon_name,
  8. start_date,
  9. rule_name,
  10. get_count,
  11. order_count,
  12. expire_count,
  13. order_original_amount,
  14. order_final_amount,
  15. reduce_amount,
  16. reduce_rate
  17. from
  18. (
  19. select
  20. id,
  21. coupon_name,
  22. date_format(start_time,'yyyy-MM-dd') start_date,
  23. case
  24. when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
  25. when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折')
  26. when coupon_type='3203' then concat('减',benefit_amount,'元')
  27. end rule_name
  28. from dim_coupon_info
  29. where dt='2020-06-14'
  30. and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)
  31. )t1
  32. left join
  33. (
  34. select
  35. coupon_id,
  36. get_count,
  37. order_count,
  38. expire_count,
  39. order_original_amount,
  40. order_final_amount,
  41. order_reduce_amount reduce_amount,
  42. cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
  43. from dwt_coupon_topic
  44. where dt='2020-06-14'
  45. )t2
  46. on t1.id=t2.coupon_id;

七、活动主题

1. 活动统计

该需求要求统计最近30日发布的所有活动的参与情况和补贴率,补贴率是指,优惠金额与参与活动的订单原价金额的比值。

A、建表语句

  1. DROP TABLE IF EXISTS ads_activity_stats;
  2. CREATE EXTERNAL TABLE `ads_activity_stats` (
  3. `dt` STRING COMMENT '统计日期',
  4. `activity_id` STRING COMMENT '活动ID',
  5. `activity_name` STRING COMMENT '活动名称',
  6. `start_date` STRING COMMENT '活动开始日期',
  7. `order_count` BIGINT COMMENT '参与活动订单数',
  8. `order_original_amount` DECIMAL(16,2) COMMENT '参与活动订单原始金额',
  9. `order_final_amount` DECIMAL(16,2) COMMENT '参与活动订单最终金额',
  10. `reduce_amount` DECIMAL(16,2) COMMENT '优惠金额',
  11. `reduce_rate` DECIMAL(16,2) COMMENT '补贴率'
  12. ) COMMENT '商品销售统计'
  13. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  14. LOCATION '/warehouse/gmall/ads/ads_activity_stats/';

B、数据装载

  1. insert overwrite table ads_activity_stats
  2. select * from ads_activity_stats
  3. union
  4. select
  5. '2020-06-14' dt,
  6. t4.activity_id,
  7. activity_name,
  8. start_date,
  9. order_count,
  10. order_original_amount,
  11. order_final_amount,
  12. reduce_amount,
  13. reduce_rate
  14. from
  15. (
  16. select
  17. activity_id,
  18. activity_name,
  19. date_format(start_time,'yyyy-MM-dd') start_date
  20. from dim_activity_rule_info
  21. where dt='2020-06-14'
  22. and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)
  23. group by activity_id,activity_name,start_time
  24. )t4
  25. left join
  26. (
  27. select
  28. activity_id,
  29. sum(order_count) order_count,
  30. sum(order_original_amount) order_original_amount,
  31. sum(order_final_amount) order_final_amount,
  32. sum(order_reduce_amount) reduce_amount,
  33. cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
  34. from dwt_activity_topic
  35. where dt='2020-06-14'
  36. group by activity_id
  37. )t5
  38. on t4.activity_id=t5.activity_id;

八、ADS 层业务数据导入脚本

A、编写脚本

在 /home/fancy/bin 目录下创建脚本 dwt_to_ads.sh

  1. [fancy@node101 bin]$ vim dwt_to_ads.sh

在脚本中填写如下内容

  1. #!/bin/bash
  2. APP=gmall
  3. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  4. if [ -n "$2" ] ;then
  5. do_date=$2
  6. else
  7. do_date=`date -d "-1 day" +%F`
  8. fi
  9. ads_activity_stats="
  10. insert overwrite table ${APP}.ads_activity_stats
  11. select * from ${APP}.ads_activity_stats
  12. union
  13. select
  14. '$do_date' dt,
  15. t4.activity_id,
  16. activity_name,
  17. start_date,
  18. order_count,
  19. order_original_amount,
  20. order_final_amount,
  21. reduce_amount,
  22. reduce_rate
  23. from
  24. (
  25. select
  26. activity_id,
  27. activity_name,
  28. date_format(start_time,'yyyy-MM-dd') start_date
  29. from ${APP}.dim_activity_rule_info
  30. where dt='$do_date'
  31. and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
  32. group by activity_id,activity_name,start_time
  33. )t4
  34. left join
  35. (
  36. select
  37. activity_id,
  38. sum(order_count) order_count,
  39. sum(order_original_amount) order_original_amount,
  40. sum(order_final_amount) order_final_amount,
  41. sum(order_reduce_amount) reduce_amount,
  42. cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
  43. from ${APP}.dwt_activity_topic
  44. where dt='$do_date'
  45. group by activity_id
  46. )t5
  47. on t4.activity_id=t5.activity_id;
  48. "
  49. ads_coupon_stats="
  50. insert overwrite table ${APP}.ads_coupon_stats
  51. select * from ${APP}.ads_coupon_stats
  52. union
  53. select
  54. '$do_date' dt,
  55. t1.id,
  56. coupon_name,
  57. start_date,
  58. rule_name,
  59. get_count,
  60. order_count,
  61. expire_count,
  62. order_original_amount,
  63. order_final_amount,
  64. reduce_amount,
  65. reduce_rate
  66. from
  67. (
  68. select
  69. id,
  70. coupon_name,
  71. date_format(start_time,'yyyy-MM-dd') start_date,
  72. case
  73. when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
  74. when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折')
  75. when coupon_type='3203' then concat('减',benefit_amount,'元')
  76. end rule_name
  77. from ${APP}.dim_coupon_info
  78. where dt='$do_date'
  79. and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
  80. )t1
  81. left join
  82. (
  83. select
  84. coupon_id,
  85. get_count,
  86. order_count,
  87. expire_count,
  88. order_original_amount,
  89. order_final_amount,
  90. order_reduce_amount reduce_amount,
  91. cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
  92. from ${APP}.dwt_coupon_topic
  93. where dt='$do_date'
  94. )t2
  95. on t1.id=t2.coupon_id;
  96. "
  97. ads_order_by_province="
  98. insert overwrite table ${APP}.ads_order_by_province
  99. select * from ${APP}.ads_order_by_province
  100. union
  101. select
  102. dt,
  103. recent_days,
  104. province_id,
  105. province_name,
  106. area_code,
  107. iso_code,
  108. iso_3166_2,
  109. order_count,
  110. order_amount
  111. from
  112. (
  113. select
  114. '$do_date' dt,
  115. recent_days,
  116. province_id,
  117. sum(order_count) order_count,
  118. sum(order_amount) order_amount
  119. from
  120. (
  121. select
  122. recent_days,
  123. province_id,
  124. case
  125. when recent_days=1 then order_last_1d_count
  126. when recent_days=7 then order_last_7d_count
  127. when recent_days=30 then order_last_30d_count
  128. end order_count,
  129. case
  130. when recent_days=1 then order_last_1d_final_amount
  131. when recent_days=7 then order_last_7d_final_amount
  132. when recent_days=30 then order_last_30d_final_amount
  133. end order_amount
  134. from ${APP}.dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  135. where dt='$do_date'
  136. )t1
  137. group by recent_days,province_id
  138. )t2
  139. join ${APP}.dim_base_province t3
  140. on t2.province_id=t3.id;
  141. "
  142. ads_order_spu_stats="
  143. insert overwrite table ${APP}.ads_order_spu_stats
  144. select * from ${APP}.ads_order_spu_stats
  145. union
  146. select
  147. '$do_date' dt,
  148. recent_days,
  149. spu_id,
  150. spu_name,
  151. tm_id,
  152. tm_name,
  153. category3_id,
  154. category3_name,
  155. category2_id,
  156. category2_name,
  157. category1_id,
  158. category1_name,
  159. sum(order_count),
  160. sum(order_amount)
  161. from
  162. (
  163. select
  164. recent_days,
  165. sku_id,
  166. case
  167. when recent_days=1 then order_last_1d_count
  168. when recent_days=7 then order_last_7d_count
  169. when recent_days=30 then order_last_30d_count
  170. end order_count,
  171. case
  172. when recent_days=1 then order_last_1d_final_amount
  173. when recent_days=7 then order_last_7d_final_amount
  174. when recent_days=30 then order_last_30d_final_amount
  175. end order_amount
  176. from ${APP}.dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  177. where dt='$do_date'
  178. )t1
  179. left join
  180. (
  181. select
  182. id,
  183. spu_id,
  184. spu_name,
  185. tm_id,
  186. tm_name,
  187. category3_id,
  188. category3_name,
  189. category2_id,
  190. category2_name,
  191. category1_id,
  192. category1_name
  193. from ${APP}.dim_sku_info
  194. where dt='$do_date'
  195. )t2
  196. on t1.sku_id=t2.id
  197. group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
  198. "
  199. ads_order_total="
  200. insert overwrite table ${APP}.ads_order_total
  201. select * from ${APP}.ads_order_total
  202. union
  203. select
  204. '$do_date',
  205. recent_days,
  206. sum(order_count),
  207. sum(order_final_amount) order_final_amount,
  208. sum(if(order_final_amount>0,1,0)) order_user_count
  209. from
  210. (
  211. select
  212. recent_days,
  213. user_id,
  214. case when recent_days=0 then order_count
  215. when recent_days=1 then order_last_1d_count
  216. when recent_days=7 then order_last_7d_count
  217. when recent_days=30 then order_last_30d_count
  218. end order_count,
  219. case when recent_days=0 then order_final_amount
  220. when recent_days=1 then order_last_1d_final_amount
  221. when recent_days=7 then order_last_7d_final_amount
  222. when recent_days=30 then order_last_30d_final_amount
  223. end order_final_amount
  224. from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  225. where dt='$do_date'
  226. )t1
  227. group by recent_days;
  228. "
  229. ads_page_path="
  230. insert overwrite table ${APP}.ads_page_path
  231. select * from ${APP}.ads_page_path
  232. union
  233. select
  234. '$do_date',
  235. recent_days,
  236. source,
  237. target,
  238. count(*)
  239. from
  240. (
  241. select
  242. recent_days,
  243. concat('step-',step,':',source) source,
  244. concat('step-',step+1,':',target) target
  245. from
  246. (
  247. select
  248. recent_days,
  249. page_id source,
  250. lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,
  251. row_number() over (partition by recent_days,session_id order by ts) step
  252. from
  253. (
  254. select
  255. recent_days,
  256. last_page_id,
  257. page_id,
  258. ts,
  259. concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id
  260. from ${APP}.dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days
  261. where dt>=date_add('$do_date',-30)
  262. and dt>=date_add('$do_date',-recent_days+1)
  263. )t2
  264. )t3
  265. )t4
  266. group by recent_days,source,target;
  267. "
  268. ads_repeat_purchase="
  269. insert overwrite table ${APP}.ads_repeat_purchase
  270. select * from ${APP}.ads_repeat_purchase
  271. union
  272. select
  273. '$do_date' dt,
  274. recent_days,
  275. tm_id,
  276. tm_name,
  277. cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))
  278. from
  279. (
  280. select
  281. recent_days,
  282. user_id,
  283. tm_id,
  284. tm_name,
  285. sum(order_count) order_count
  286. from
  287. (
  288. select
  289. recent_days,
  290. user_id,
  291. sku_id,
  292. count(*) order_count
  293. from ${APP}.dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days
  294. where dt>=date_add('$do_date',-29)
  295. and dt>=date_add('$do_date',-recent_days+1)
  296. group by recent_days, user_id,sku_id
  297. )t1
  298. left join
  299. (
  300. select
  301. id,
  302. tm_id,
  303. tm_name
  304. from ${APP}.dim_sku_info
  305. where dt='$do_date'
  306. )t2
  307. on t1.sku_id=t2.id
  308. group by recent_days,user_id,tm_id,tm_name
  309. )t3
  310. group by recent_days,tm_id,tm_name;
  311. "
  312. ads_user_action="
  313. with
  314. tmp_page as
  315. (
  316. select
  317. '$do_date' dt,
  318. recent_days,
  319. sum(if(array_contains(pages,'home'),1,0)) home_count,
  320. sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
  321. from
  322. (
  323. select
  324. recent_days,
  325. mid_id,
  326. collect_set(page_id) pages
  327. from
  328. (
  329. select
  330. dt,
  331. mid_id,
  332. page.page_id
  333. from ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
  334. where dt>=date_add('$do_date',-29)
  335. and page.page_id in('home','good_detail')
  336. )t1 lateral view explode(Array(1,7,30)) tmp as recent_days
  337. where dt>=date_add('$do_date',-recent_days+1)
  338. group by recent_days,mid_id
  339. )t2
  340. group by recent_days
  341. ),
  342. tmp_cop as
  343. (
  344. select
  345. '$do_date' dt,
  346. recent_days,
  347. sum(if(cart_count>0,1,0)) cart_count,
  348. sum(if(order_count>0,1,0)) order_count,
  349. sum(if(payment_count>0,1,0)) payment_count
  350. from
  351. (
  352. select
  353. recent_days,
  354. user_id,
  355. case
  356. when recent_days=1 then cart_last_1d_count
  357. when recent_days=7 then cart_last_7d_count
  358. when recent_days=30 then cart_last_30d_count
  359. end cart_count,
  360. case
  361. when recent_days=1 then order_last_1d_count
  362. when recent_days=7 then order_last_7d_count
  363. when recent_days=30 then order_last_30d_count
  364. end order_count,
  365. case
  366. when recent_days=1 then payment_last_1d_count
  367. when recent_days=7 then payment_last_7d_count
  368. when recent_days=30 then payment_last_30d_count
  369. end payment_count
  370. from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
  371. where dt='$do_date'
  372. )t1
  373. group by recent_days
  374. )
  375. insert overwrite table ${APP}.ads_user_action
  376. select * from ${APP}.ads_user_action
  377. union
  378. select
  379. tmp_page.dt,
  380. tmp_page.recent_days,
  381. home_count,
  382. good_detail_count,
  383. cart_count,
  384. order_count,
  385. payment_count
  386. from tmp_page
  387. join tmp_cop
  388. on tmp_page.recent_days=tmp_cop.recent_days;
  389. "
  390. ads_user_change="
  391. insert overwrite table ${APP}.ads_user_change
  392. select * from ${APP}.ads_user_change
  393. union
  394. select
  395. churn.dt,
  396. user_churn_count,
  397. user_back_count
  398. from
  399. (
  400. select
  401. '$do_date' dt,
  402. count(*) user_churn_count
  403. from ${APP}.dwt_user_topic
  404. where dt='$do_date'
  405. and login_date_last=date_add('$do_date',-7)
  406. )churn
  407. join
  408. (
  409. select
  410. '$do_date' dt,
  411. count(*) user_back_count
  412. from
  413. (
  414. select
  415. user_id,
  416. login_date_last
  417. from ${APP}.dwt_user_topic
  418. where dt='$do_date'
  419. and login_date_last='$do_date'
  420. )t1
  421. join
  422. (
  423. select
  424. user_id,
  425. login_date_last login_date_previous
  426. from ${APP}.dwt_user_topic
  427. where dt=date_add('$do_date',-1)
  428. )t2
  429. on t1.user_id=t2.user_id
  430. where datediff(login_date_last,login_date_previous)>=8
  431. )back
  432. on churn.dt=back.dt;
  433. "
  434. ads_user_retention="
  435. insert overwrite table ${APP}.ads_user_retention
  436. select * from ${APP}.ads_user_retention
  437. union
  438. select
  439. '$do_date',
  440. login_date_first create_date,
  441. datediff('$do_date',login_date_first) retention_day,
  442. sum(if(login_date_last='$do_date',1,0)) retention_count,
  443. count(*) new_user_count,
  444. cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate
  445. from ${APP}.dwt_user_topic
  446. where dt='$do_date'
  447. and login_date_first>=date_add('$do_date',-7)
  448. and login_date_first<'$do_date'
  449. group by login_date_first;
  450. "
  451. ads_user_total="
  452. insert overwrite table ${APP}.ads_user_total
  453. select * from ${APP}.ads_user_total
  454. union
  455. select
  456. '$do_date',
  457. recent_days,
  458. sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,
  459. sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,
  460. sum(order_final_amount) order_final_amount,
  461. sum(if(order_final_amount>0,1,0)) order_user_count,
  462. sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count
  463. from
  464. (
  465. select
  466. recent_days,
  467. user_id,
  468. login_date_first,
  469. login_date_last,
  470. order_date_first,
  471. case when recent_days=0 then order_final_amount
  472. when recent_days=1 then order_last_1d_final_amount
  473. when recent_days=7 then order_last_7d_final_amount
  474. when recent_days=30 then order_last_30d_final_amount
  475. end order_final_amount,
  476. if(recent_days=0,'1970-01-01',date_add('$do_date',-recent_days+1)) recent_days_ago
  477. from ${APP}.dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days
  478. where dt='$do_date'
  479. )t1
  480. group by recent_days;
  481. "
  482. ads_visit_stats="
  483. insert overwrite table ${APP}.ads_visit_stats
  484. select * from ${APP}.ads_visit_stats
  485. union
  486. select
  487. '$do_date' dt,
  488. is_new,
  489. recent_days,
  490. channel,
  491. count(distinct(mid_id)) uv_count,
  492. cast(sum(duration)/1000 as bigint) duration_sec,
  493. cast(avg(duration)/1000 as bigint) avg_duration_sec,
  494. sum(page_count) page_count,
  495. cast(avg(page_count) as bigint) avg_page_count,
  496. count(*) sv_count,
  497. sum(if(page_count=1,1,0)) bounce_count,
  498. cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
  499. from
  500. (
  501. select
  502. session_id,
  503. mid_id,
  504. is_new,
  505. recent_days,
  506. channel,
  507. count(*) page_count,
  508. sum(during_time) duration
  509. from
  510. (
  511. select
  512. mid_id,
  513. channel,
  514. recent_days,
  515. is_new,
  516. last_page_id,
  517. page_id,
  518. during_time,
  519. concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id
  520. from
  521. (
  522. select
  523. mid_id,
  524. channel,
  525. last_page_id,
  526. page_id,
  527. during_time,
  528. ts,
  529. recent_days,
  530. if(visit_date_first>=date_add('$do_date',-recent_days+1),'1','0') is_new
  531. from
  532. (
  533. select
  534. t1.mid_id,
  535. t1.channel,
  536. t1.last_page_id,
  537. t1.page_id,
  538. t1.during_time,
  539. t1.dt,
  540. t1.ts,
  541. t2.visit_date_first
  542. from
  543. (
  544. select
  545. mid_id,
  546. channel,
  547. last_page_id,
  548. page_id,
  549. during_time,
  550. dt,
  551. ts
  552. from ${APP}.dwd_page_log
  553. where dt>=date_add('$do_date',-30)
  554. )t1
  555. left join
  556. (
  557. select
  558. mid_id,
  559. visit_date_first
  560. from ${APP}.dwt_visitor_topic
  561. where dt='$do_date'
  562. )t2
  563. on t1.mid_id=t2.mid_id
  564. )t3 lateral view explode(Array(1,7,30)) tmp as recent_days
  565. where dt>=date_add('$do_date',-recent_days+1)
  566. )t4
  567. )t5
  568. group by session_id,mid_id,is_new,recent_days,channel
  569. )t6
  570. group by is_new,recent_days,channel;
  571. "
  572. case $1 in
  573. "ads_activity_stats" )
  574. hive -e "$ads_activity_stats"
  575. ;;
  576. "ads_coupon_stats" )
  577. hive -e "$ads_coupon_stats"
  578. ;;
  579. "ads_order_by_province" )
  580. hive -e "$ads_order_by_province"
  581. ;;
  582. "ads_order_spu_stats" )
  583. hive -e "$ads_order_spu_stats"
  584. ;;
  585. "ads_order_total" )
  586. hive -e "$ads_order_total"
  587. ;;
  588. "ads_page_path" )
  589. hive -e "$ads_page_path"
  590. ;;
  591. "ads_repeat_purchase" )
  592. hive -e "$ads_repeat_purchase"
  593. ;;
  594. "ads_user_action" )
  595. hive -e "$ads_user_action"
  596. ;;
  597. "ads_user_change" )
  598. hive -e "$ads_user_change"
  599. ;;
  600. "ads_user_retention" )
  601. hive -e "$ads_user_retention"
  602. ;;
  603. "ads_user_total" )
  604. hive -e "$ads_user_total"
  605. ;;
  606. "ads_visit_stats" )
  607. hive -e "$ads_visit_stats"
  608. ;;
  609. "all" )
  610. hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_visit_stats"
  611. ;;
  612. esac

增加脚本执行权限

  1. [fancy@node101 bin]$ chmod 777 dwt_to_ads.sh

B、脚本使用

执行脚本

  1. [fancy@node101 bin]$ dwt_to_ads.sh all 2020-06-14

查看数据是否导入

发表评论

表情:
评论列表 (有 0 条评论,141人围观)

还没有评论,来说两句吧...

相关阅读