hive 求最大最小值均值及对应的一个key键(对应行)

偏执的太偏执、 2023-02-16 04:12 84阅读 0赞

hive 求最大最小均值就不多说了。
此次在业务上碰到一个问题,是要求最大值、最小值、平均值的同时,还要求最大值、最小值对应行的key字段(以找到对应时间)。并且key中的时间戳还是0时区,在查询时要注意对时区进行转换。
表结构大致如下:
key double
id+时间 对应的double数值
方法一:join (最常见)
比如取最小值和最小值所在行对应的key:

  1. select *
  2. from
  3. (
  4. select min(d) as min
  5. from
  6. (
  7. select key,d
  8. from default.test_202006
  9. where key>=50104160000000 and key<50104240000000
  10. union all
  11. select key,d
  12. from default.test_202006
  13. where key>=50105000000000 and key<50105160000000
  14. )
  15. )t1
  16. left join
  17. (
  18. select key,d
  19. from
  20. (
  21. select key,d
  22. from default.test_202006
  23. where key>=50104160000000 and key<50104240000000
  24. union all
  25. select key,d
  26. from default.test_202006
  27. where key>=50105000000000 and key<50105160000000
  28. )
  29. )t2
  30. on t1.min=t2.d

这样要达到目的,min +1 join, max +1 join, + avg

方法二:
select * from table where double in (select min(double) from table)

  1. select *
  2. from
  3. (
  4. select *
  5. from default.test_202006
  6. where key>=50104160000000 and key<50104240000000
  7. union all
  8. select *
  9. from default.test_202006
  10. where key>=50105000000000 and key<50105160000000
  11. ) where d in ( select min(d) as min
  12. from
  13. (
  14. select *
  15. from default.test_202006
  16. where key>=50104160000000 and key<50104240000000
  17. union all
  18. select *
  19. from default.test_202006
  20. where key>=50105000000000 and key<50105160000000
  21. ))

方法三: row_number()

下面的语句取最大最小值:

  1. select tt1.d as min, tt1.key as minkey, tt2.d as max, tt2.key as maxkey from
  2. (
  3. select *
  4. from
  5. (
  6. select row_number() over(partition by s order by d) as ord, key, d
  7. from
  8. (
  9. select *
  10. from default.test_202006
  11. where key>=50104160000000 and key<50104240000000
  12. union all
  13. select *
  14. from default.test_202006
  15. where key>=50105000000000 and key<50105160000000
  16. )
  17. )t1 where t1.ord=1
  18. )tt1,
  19. (
  20. select *
  21. from
  22. (
  23. select row_number() over(partition by s order by d desc) as ord, key, d
  24. from
  25. (
  26. select *
  27. from default.test_202006
  28. where key>=50104160000000 and key<50104240000000
  29. union all
  30. select *
  31. from default.test_202006
  32. where key>=50105000000000 and key<50105160000000
  33. )
  34. )t2 where t2.ord=1
  35. )tt2

方法四:在某同学建议下搞的骚操作

row_number两次,一次正序一次倒序。取出来再用collect处理。这个效率在测试中是最快的。最后业务上也是采用了这个方式,比其他同学join的处理快出20+s。

  1. select t3.d[0] as min, t3.key[0] as minkey, t3.d[1] as max, t3.key[1] as maxkey, t4.avg
  2. from
  3. (
  4. select collect_set(key) as key, collect_set(d) as d
  5. from
  6. (
  7. select '999' as num,
  8. row_number() over(partition by s order by d ) as ord1,
  9. row_number() over(partition by i order by d desc) as ord2,
  10. key, d
  11. from
  12. (
  13. select *
  14. from default.test_202006
  15. where key>=50104160000000 and key<50104240000000
  16. union all
  17. select *
  18. from default.test_202006
  19. where key>=50105000000000 and key<50105160000000
  20. )
  21. )t2 where t2.ord1=1 or t2.ord2=1 group by num
  22. )t3,
  23. (
  24. select avg(d) as avg
  25. from
  26. (
  27. select *
  28. from default.test_202006
  29. where key>=50104160000000 and key<50104240000000
  30. union all
  31. select *
  32. from default.test_202006
  33. where key>=50105000000000 and key<50105160000000
  34. )
  35. )t4

以上方法分组也都适用。另外还有开窗函数配合FIRST_VALUE(),LAST_VALUE() 也可以尝试。

发表评论

表情:
评论列表 (有 0 条评论,84人围观)

还没有评论,来说两句吧...

相关阅读

    相关 分治法

    分治法是一种递归的问题解决方法,它将一个大问题划分为多个小问题,然后逐个解决这些小问题,最后将结果合并得到最终的解决方案。对于求最大最小值的问题,可以使用分治法来解决。 以下