安装和体验hive-1.2

待我称王封你为后i 2022-12-21 15:56 236阅读 0赞

欢迎访问我的GitHub

这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog\_demos

关于hive

Hive是种基于Hadoop的数据仓库工具,将结构化的数据文件映射为一张数据库表,并提供类SQL查询功能。

环境信息

本文对应的环境信息如下:

  1. CentOS Linux release 7.5.1804
  2. JDK:1.8.0_191
  3. hadoop:2.7.7
  4. hive:1.2.2

hadoop的部署和启动

  1. hadoop环境的部署和启动请参考《Linux部署hadoop2.7.7集群》;
  2. 注意:确保环境变量中有HADOOP_HOME的配置;

安装和配置MySQL(5.7.27版本)

  1. MySQL用来存储元数据,我这里为了简化操作是在docker环境下部署的,一行命令即可:

    docker run —name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=888888 -idt mysql:5.7.27

  2. 进入mysql容器:

    docker exec -it mysql /bin/bash

  3. 进入容器后连接mysql,密码是888888:

    mysql -h127.0.0.1 -uroot -p

  4. 新建名为hive的mysql账号:

    CREATE USER ‘hive’ IDENTIFIED BY ‘888888’;

  5. 给hive账号授权访问(并且hvie账号还有权给其他账号授权):

    GRANT ALL PRIVILEGES ON . TO ‘hive’@’%’ WITH GRANT OPTION;

  6. 刷新权限数据:

    flush privileges;

  7. 在宿主机的终端执行以下命令重启mysql服务:

    docker exec mysql service mysql restart

  8. 再次进入mysql容器,以hive账号的身份登录mysql:

    mysql -uhive -p

  9. 创建名为hive的数据库:

    CREATE DATABASE hive;

安装hive

  1. 去hive官网下载,地址是:http://mirror.bit.edu.cn/apache/hive/ ,选择合适的版本,如下图:format_png注意:接下来的操作用的账号都不是root,而是hadoop
  2. 在hadoop账号的家目录下解压刚刚下载的apache-hive-1.2.2-bin.tar.gz文件,是个名为apache-hive-1.2.2-bin的目录;
  3. 编辑hadoop账号的.bash_profile文件,增加一个环境变量,就是将刚刚解压出来的apache-hive-1.2.2-bin文件夹的完整路径:

    export HIVE_HOME=/home/hadoop/apache-hive-1.2.2-bin

  4. 修改完毕后,重新打开一个ssh连接,或者执行source ~/.bash_profile让环境变量立即生效;

  5. 进入目录apache-hive-1.2.2-bin/conf/,用模板文件复制一份配置文件:

    cp hive-default.xml.template hive-default.xml

  6. 在此目录创建名为hive-site.xml的文件,内容如下:

    <?xml version=”1.0” encoding=”UTF-8” standalone=”no”?>
    <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>



    javax.jdo.option.ConnectionURL
    jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true


    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver


    javax.jdo.option.ConnectionUserName
    hive


    javax.jdo.option.ConnectionPassword
    888888

  7. 将mysql的JDBC包放在此目录:/home/hadoop/apache-hive-1.2.2-bin/lib/,我这里用的是mysql-connector-java-5.1.47.jar,您可以在此下载:https://download.csdn.net/download/boling\_cavalry/11834367

设置工作已经完成了,接下来是启动和初始化;

初始化和启动hive

  1. 进入目录apache-hive-1.2.2-bin/bin,执行以下命令初始化:

    ./schematool -initSchema -dbType mysql

操作成功后,控制台提示:

  1. [hadoop@node0 bin]$ ./schematool -initSchema -dbType mysql
  2. Metastore connection URL: jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true
  3. Metastore Connection Driver : com.mysql.jdbc.Driver
  4. Metastore connection User: hive
  5. Starting metastore schema initialization to 1.2.0
  6. Initialization script hive-schema-1.2.0.mysql.sql
  7. Initialization script completed
  8. schemaTool completed
  1. 在mysql上看一下,数据库hvie下建了多个表:

    mysql> show tables;
    +—————————————-+
    | Tables_in_hive |
    +—————————————-+
    | BUCKETING_COLS |
    | CDS |
    | COLUMNS_V2 |
    | COMPACTION_QUEUE |
    | COMPLETED_TXN_COMPONENTS |
    | DATABASE_PARAMS |
    | DBS |
    | DB_PRIVS |
    | DELEGATION_TOKENS |
    | FUNCS |
    | FUNC_RU |
    | GLOBAL_PRIVS |
    | HIVE_LOCKS |
    | IDXS |
    | INDEX_PARAMS |
    | MASTER_KEYS |
    | NEXT_COMPACTION_QUEUE_ID |
    | NEXT_LOCK_ID |
    | NEXT_TXN_ID |
    | NOTIFICATION_LOG |
    | NOTIFICATION_SEQUENCE |
    | NUCLEUS_TABLES |
    | PARTITIONS |
    | PARTITION_EVENTS |
    | PARTITION_KEYS |
    | PARTITION_KEY_VALS |
    | PARTITION_PARAMS |
    | PART_COL_PRIVS |
    | PART_COL_STATS |
    | PART_PRIVS |
    | ROLES |
    | ROLE_MAP |
    | SDS |
    | SD_PARAMS |
    | SEQUENCE_TABLE |
    | SERDES |
    | SERDE_PARAMS |
    | SKEWED_COL_NAMES |
    | SKEWED_COL_VALUE_LOC_MAP |
    | SKEWED_STRING_LIST |
    | SKEWED_STRING_LIST_VALUES |
    | SKEWED_VALUES |
    | SORT_COLS |
    | TABLE_PARAMS |
    | TAB_COL_STATS |
    | TBLS |
    | TBL_COL_PRIVS |
    | TBL_PRIVS |
    | TXNS |
    | TXN_COMPONENTS |
    | TYPES |
    | TYPE_FIELDS |
    | VERSION |
    +—————————————-+
    53 rows in set (0.00 sec)

  2. 在目录/home/hadoop/apache-hive-1.2.2-bin/bin执行命令./hive即可启动;

初始化和启动已经完成,接下来验证hive;

验证

  1. 前面执行./hive之后,已进入了对话模式,输入以下命令创建名为test001的数据库:

    CREATE database test001;

  2. 选择该数据库:

    use test001;

  3. 创建一个名为test_table的表:

    create table test_table(
    id INT,
    word STRING
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘\t’
    STORED AS TEXTFILE;

  4. 新建一个ssh连接,创建名为hive_test.txt的文本文件,内容如下:

    1 aaa
    2 bbb
    3 ccc
    4 ddd
    5 eee
    6 fff

  5. 回到和hive对话模式的控制台,输入以下命令,将上述文本文件的内容导入到test001.test_table表中:

    LOAD DATA LOCAL INPATH ‘/home/hadoop/hive_test.txt’ INTO TABLE test001.test_table;

控制台提示如下:

  1. hive> LOAD DATA LOCAL INPATH '/home/hadoop/hive_test.txt' INTO TABLE test001.test_table;
  2. Loading data to table test001.test_table
  3. Table test001.test_table stats: [numFiles=1, totalSize=36]
  4. OK
  5. Time taken: 0.264 seconds
  1. 执行select操作,可以看到数据已经全部入库:

    hive> select * from test_table;
    OK
    1 aaa
    2 bbb
    3 ccc
    4 ddd
    5 eee
    6 fff
    Time taken: 0.453 seconds, Fetched: 6 row(s)

  2. 执行group by查询:

    select word,count(word) from test_table GROUP BY word;

此时会启动一个job来完成上述查询,控制台输出如下:

  1. hive> select word,count(word) from test_table GROUP BY word;
  2. Query ID = hadoop_20191007190528_3bd50401-267b-4d75-8b08-17ead5f0d790
  3. Total jobs = 1
  4. Launching Job 1 out of 1
  5. Number of reduce tasks not specified. Estimated from input data size: 1
  6. In order to change the average load for a reducer (in bytes):
  7. set hive.exec.reducers.bytes.per.reducer=<number>
  8. In order to limit the maximum number of reducers:
  9. set hive.exec.reducers.max=<number>
  10. In order to set a constant number of reducers:
  11. set mapreduce.job.reduces=<number>
  12. Starting Job = job_1570427946161_0002, Tracking URL = http://node0:8088/proxy/application_1570427946161_0002/
  13. Kill Command = /home/hadoop/hadoop-2.7.7/bin/hadoop job -kill job_1570427946161_0002
  14. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
  15. 2019-10-07 19:05:34,812 Stage-1 map = 0%, reduce = 0%
  16. 2019-10-07 19:05:39,991 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.29 sec
  17. 2019-10-07 19:05:46,201 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.23 sec
  18. MapReduce Total cumulative CPU time: 3 seconds 230 msec
  19. Ended Job = job_1570427946161_0002
  20. MapReduce Jobs Launched:
  21. Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.23 sec HDFS Read: 7000 HDFS Write: 36 SUCCESS
  22. Total MapReduce CPU Time Spent: 3 seconds 230 msec
  23. OK
  24. aaa 1
  25. bbb 1
  26. ccc 1
  27. ddd 1
  28. eee 1
  29. fff 1
  30. Time taken: 18.614 seconds, Fetched: 6 row(s)

至此,hive的安装和体验实战就完成了,希望本文能给一起学习hive的读者们一些参考。

发表评论

表情:
评论列表 (有 0 条评论,236人围观)

还没有评论,来说两句吧...

相关阅读

    相关 hive(一) -- 简介安装

    数据仓库 数据仓库,英文名称为Data Warehouse,可简写为DW。是一个用于存储,分析,报告的数据系统.数据仓库的目的是构建面向分析的集成化数据环境,分析结果为企