hive 分区表使用的一些注意事项-蒲公英云

hive 分区表使用的一些注意事项

青旅半醒 2021-06-24 15:57 691阅读 0赞

hive 外表数据读取：

1、hive非分区普通表：
1）建立外表：

CREATE EXTERNAL TABLE `test_liu`(
  `a` string, 
  `b` string, 
  `c` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/data/qytt/test/testhive'

2）上传文件到hdfs目录：
$ hadoop fs -put test1 /data/qytt/test/testhive/
3）查询：
hive> select * from test_liu;
OK
1 2 3

2、hive分区外表：
1）建立外表：

CREATE EXTERNAL TABLE `test_liu`(
  `a` string, 
  `b` string, 
  `c` string)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/data/qytt/test/testhive'

2）上传文件到hdfs目录：
A、创建分区目录
$ hadoop fs -mkdir /data/qytt/test/testhive/dt=1
B、上传文件
$ hadoop fs -put test1 /data/qytt/test/testhive/dt=1

3）查询：
A、必须创建分区，否则查不出数据：
alter table test_liu add partition (dt=”1”);
hive> show partitions test_liu;
OK
dt=1
B、然后查询：
hive> select * from test_liu;
OK
1 2 3
4）此时，往/data/qytt/test/testhive/dt=1目录下再建一个空目录，
hadoop fs -mkdir /data/qytt/test/testhive/dt=1/hour=0
然后执行查询会报如下错
Failed with exception java.io.IOException:java.io.IOException: Not a file: hdfs://hadoop-jy-namenode/data/qytt/test/testhive/dt=1/hour=0

3、创建多分区目录：
1)建立外表：

CREATE EXTERNAL TABLE `test_liu`(
  `a` string, 
  `b` string, 
  `c` string)
PARTITIONED BY ( 
  `dt` string,
  `hour` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/data/qytt/test/testhive'

2)上传文件到hdfs目录：
A、创建分区目录
$ hadoop fs -mkdir /data/qytt/test/testhive/dt=1
B、创建分区目录
$ hadoop fs -mkdir /data/qytt/test/testhive/dt=1/hour=0
C、上传文件
$ hadoop fs -put test1 /data/qytt/test/testhive/dt=1

3）查询：
A、创建分区：
alter table test_liu add partition (dt=”1”);
FAILED: SemanticException partition spec {dt=1} doesn’t contain all (2) partition columns
添加多分区时，必须同时指定，否则报错。
hive> alter table test_liu add partition (dt=”1”,hour=”0”);
OK
hive> show partitions test_liu;
OK
dt=1/hour=0

B、查询：
hive> select * from test_liu;
OK
发现还是没有数据，奇怪，文件已经在/dt=1目录下了，为什么还hive无法读取呢？
$ hadoop fs -put test1 /data/qytt/test/testhive/dt=1/hour=0
把test1文件放到hour=0目录下，hive就可以读取了。
hive> select * from test_liu;
OK
1 2 3
此时删除/dt=1目录下的test1文件，也不会有问题。（由此，可以得出hive如果是分区表，那么文件只有放到最里层分区目录下才起作用，放到其他位置数据无法被读取；此外最里层分区目录里不能再有其他子目录，否则报错）
hive> select * from test_liu;
OK
1 2 3