hive分区partition介绍
一、简要概述
- Hive分区更方便于数据管理,常见的有时间分区和业务分区。
二、hive分区原理
通过实例来理解Hive分区的原理:
(一)多分区操作:
创建分区表
CREATE TABLE default.DWA_LBS_FUSE_SCIC_XXX(time string comment ‘时间戳’,mdn string comment ‘手机号码’,lon string comment ‘经度’,lat string comment ‘纬度’,cityCode string comment ‘地市’,areaCode string comment ‘地区’,dateType string comment ‘数据类型’,type string comment ‘筛选方式’) PARTITIONED BY (provId string comment ‘省份’,day_id string comment ‘账期’)ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ STORED AS TEXTFILE LOCATION ‘hdfs://Bdnameservice/domain/ns36/wznlxt_yxt/dwa_db.db/dwa_lbs_fuse_scic_xxx’;
[注意⚠️]
如果分区字段和表字段重复的话,就会报以下错误信息:
CREATE TABLE default.DWA_LBS_FUSE_SCIC_XXX(time string comment '时间戳',mdn string comment '手机号码',lon string comment '经度',lat string comment '纬度',cityCode string comment '地市',areaCode string comment '地区',dateType string comment '数据类型',type string comment '筛选方式',provId string comment '省份',day_id string comment '账期') PARTITIONED BY (provId string comment '省份',day_id string comment '账期')ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'hdfs://Bdnameservice/domain/ns36/wznlxt_yxt/dwa_db.db/dwa_lbs_fuse_scic_xxx';
----------------------------------------------
注:这里分区字段不能和表中的字段重复。
报错信息:FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
装载数据
样例如下:
12,20210126,705249,6,530000,530100,530103,102.71192,25.04458,2021-01-28 12:52:28,853,20210126
执行load data
LOAD DATA INPATH ‘/xxxxxx’ INTO TABLE default.dwa_lbs_fuse_scic_xxx PARTITION(provId=’140000’,DAY_ID=’20201220’);查看数据及分区
(1)查看数据
select * from dwa_lbs_fuse_scic_xxx;
(2)查看分区
show partitions dwa_lbs_fuse_scic_xxx;
还没有评论,来说两句吧...