[Hadoop] Hadoop2.8.1 伪分布式部署

约定不等于承诺〃 2022-05-24 05:57 299阅读 0赞

Hadoop2.8.1 伪分布式安装,需要先把Hadoop的源码包编译成tar包,或者直接下载已编译好的tar包,再来部署。

为分布式部署方式的HDFS NameNode、DataNode、Secondary NameNode以及Yarn的Resource Manager和Node Manager全部运行在同一台主机上,因此部署过程中只需要一台主机。

操作系统:CentOS7.3

  1. 准备环境

1.1 修改主机名并添加主机名到IP地址的映射

  1. [root@localhost ~]# hostnamectl set-hostname hadoop01
  2. [root@localhost ~]# vi /etc/hosts
  3. 写入
  4. 192.168.1.8 hadoop01

重启主机

  1. [root@localhost ~]# reboot

1.2 安装JDK

卸载系统自带的JDK7

  1. [root@hadoop01 ~]# rpm -qa | grep -i java
  2. tzdata-java-2016g-2.el7.noarch
  3. java-1.7.0-openjdk-1.7.0.111-2.6.7.8.el7.x86_64
  4. python-javapackages-3.4.1-11.el7.noarch
  5. javamail-1.4.6-8.el7.noarch
  6. javapackages-tools-3.4.1-11.el7.noarch
  7. java-1.7.0-openjdk-headless-1.7.0.111-2.6.7.8.el7.x86_64
  8. [root@hadoop01 ~]# cat /etc/redhat-release
  9. CentOS Linux release 7.3.1611 (Core)
  10. [root@hadoop01 ~]# rpm -e --nodeps tzdata-java-2016g-2.el7.noarch
  11. [root@hadoop01 ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.111-2.6.7.8.el7.x86_64
  12. [root@hadoop01 ~]# rpm -e --nodeps python-javapackages-3.4.1-11.el7.noarch
  13. [root@hadoop01 ~]# rpm -e --nodeps javamail-1.4.6-8.el7.noarch
  14. [root@hadoop01 ~]# rpm -e --nodeps javapackages-tools-3.4.1-11.el7.noarch
  15. [root@hadoop01 ~]# rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.111-2.6.7.8.el7.x86_64
  16. [root@hadoop01 ~]# rpm -qa | grep -i java
  17. [root@hadoop01 ~]# java -version
  18. -bash: java: command not found
  19. [root@hadoop01 ~]#

安装JDK8

  1. [root@hadoop01 software]# cd
  2. [root@hadoop01 ~]# mkdir /usr/java
  3. [root@hadoop01 ~]# cp /opt/software/jdk-8u45-linux-x64.gz /usr/java/
  4. [root@hadoop01 ~]# cd /usr/java/
  5. [root@hadoop01 java]# tar -zxvf jdk-8u45-linux-x64.gz
  6. [root@hadoop01 java]# chown -R root:root jdk1.8.0_45/ # 注意Linux下解压出来的文件夹属主

配置环境变量

  1. [root@hadoop01 jdk1.8.0_45]# vi /etc/profile
  2. # 写入
  3. export JAVA_HOME=/usr/java/jdk1.8.0_45/
  4. export PATH=$PATH:$JAVA_HOME/bin
  5. # 让配置生效
  6. [root@hadoop01 jdk1.8.0_45]# source /etc/profile
  7. [root@hadoop01 jdk1.8.0_45]# java -version
  8. java version "1.8.0_45"
  9. Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
  10. Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
  11. [root@hadoop01 jdk1.8.0_45]#

1.3 创建hadoop用户组和用户

  1. [root@hadoop01 ~]# groupadd hadoop
  2. [root@hadoop01 ~]# useradd -d /home/hadoop -g hadoop hadoop
  3. [root@hadoop01 ~]# passwd hadoop
  4. [root@hadoop01 ~]# id hadoop
  5. uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
  6. [root@hadoop01 ~]#

配置hadoop用户ssh到本机免密

  1. [root@hadoop01 ~]# su - hadoop
  2. Last login: Tue May 22 09:21:15 EDT 2018 on pts/0
  3. [hadoop@hadoop01 ~]$ ssh-keygen
  4. [hadoop@hadoop01 ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
  5. [hadoop@hadoop01 ~]$ chmod 600 .ssh/authorized_keys
  6. [hadoop@hadoop01 ~]$ ssh hadoop01 date
  7. The authenticity of host 'hadoop01 (192.168.1.8)' can't be established.
  8. ECDSA key fingerprint is c4:8b:d9:92:fe:e2:85:dd:1e:06:dd:d7:e5:9e:a5:c4.
  9. Are you sure you want to continue connecting (yes/no)? yes
  10. Warning: Permanently added 'hadoop01,192.168.1.8' (ECDSA) to the list of known hosts.
  11. Tue May 22 05:33:18 EDT 2018
  12. [hadoop@hadoop01 ~]$ exit
  13. logout
  14. [root@hadoop01 ~]#
  1. Hadoop伪分布式部署

2.1 解压Hadoop

拷贝编译完成的Hadoop tar包(或者直接下载编译好的tar包)到指定路径下解压

  1. [root@hadoop01 software]# tar -zxvf hadoop-2.8.1.tar.gz

配置环境变量

  1. [root@hadoop01 hadoop-2.8.1]# vi /etc/profile
  2. # 写入
  3. export HADOOP_HOME=/opt/software/hadoop-2.8.1
  4. export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
  5. # 让配置生效
  6. [root@hadoop01 hadoop-2.8.1]# source /etc/profile
  7. [root@hadoop01 hadoop-2.8.1]# hadoop version
  8. Hadoop 2.8.1
  9. Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 20fe5304904fc2f5a18053c389e43cd26f7a70fe
  10. Compiled by vinodkv on 2017-06-02T06:14Z
  11. Compiled with protoc 2.5.0
  12. From source with checksum 60125541c2b3e266cbf3becc5bda666
  13. This command was run using /opt/software/hadoop-2.8.1/share/hadoop/common/hadoop-common-2.8.1.jar
  14. [root@hadoop01 hadoop-2.8.1]#

2.2 修改Hadoop配置文件

配置core-site.xm

  1. [root@hadoop01 hadoop-2.8.1]# pwd
  2. /opt/software/hadoop-2.8.1
  3. [root@hadoop01 hadoop-2.8.1]# vi etc/hadoop/core-site.xml
  4. # 写入配置
  5. <configuration>
  6. <property>
  7. <name>fs.defaultFS</name>
  8. <value>hdfs://l92.168.1.8:9000</value> # 官网拷贝下来的配置为localhost,改为自己主机的IP地址,配置nn对外提供服务
  9. </property>
  10. </configuration>

配置hdfs-site.xml

  1. [root@hadoop01 hadoop-2.8.1]# vi etc/hadoop/hdfs-site.xml
  2. # 写入配置
  3. <configuration>
  4. <property>
  5. <name>dfs.replication</name>
  6. <value>1</value> # 生产集群环境,默认副本数写3. 伪分布式模式只有一台主机,副本数写1
  7. </property>
  8. <property>
  9. <name>dfs.namenode.secondary.http-address</name>
  10. <value>192.168.1.8:50090</value> # 配置snn对外提供服务
  11. </property>
  12. <property>
  13. <name>dfs.namenode.secondary.https-address</name>
  14. <value>192.168.1.8:50091</value> # 配置snn对外提供服务
  15. </property>
  16. </configuration>

配置slaves(配置dn对外提供服务)

  1. [root@hadoop01 hadoop-2.8.1]# vi etc/hadoop/slaves
  2. # 修改localhost为自己主机的IP地址
  3. 192.168.1.8

配置hadoop-env.sh(配置JAVA_HOME)

  1. [root@hadoop01 hadoop-2.8.1]# vi etc/hadoop/hadoop-env.sh
  2. # 写入配置
  3. export JAVA_HOME=/usr/java/jdk1.8.0_45

最终我们会以haoop用户(不用root)来运行Hadoop服务,先把hadoop安装路径的属主设为hadoop用户,然后用hadoop用户来启动hdfs。

2.3 修改hadoop安装路径的属主为hadoop

  1. [root@hadoop01 software]# cd /opt/software/
  2. [root@hadoop01 software]# chown -R hadoop:hadoop hadoop-2.8.1
  3. [root@hadoop01 software]# ls -ld hadoop-2.8.1
  4. drwxrwxr-x. 9 hadoop hadoop 149 Jun 2 2017 hadoop-2.8.1
  5. [root@hadoop01 software]#

2.4 格式化hdfs

切换到hadoop用户

  1. [root@hadoop01 ~]# su - hadoop
  2. Last login: Tue May 22 09:22:54 EDT 2018 on pts/0
  3. [hadoop@hadoop01 ~]$

格式化hdfs

  1. [hadoop@hadoop01 ~]$ hdfs namenode -format

2.5 启动hdfs

  1. [hadoop@hadoop01 ~]$ /opt/software/hadoop-2.8.1/sbin/start-dfs.sh
  2. Starting namenodes on [hadoop01]
  3. hadoop01: starting namenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop01.out
  4. 192.168.1.8: starting datanode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop01.out
  5. Starting secondary namenodes [hadoop01]
  6. hadoop01: starting secondarynamenode, logging to /opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop01.out
  7. [hadoop@hadoop01 ~]$ jps
  8. 3683 Jps
  9. 3206 NameNode # hdfs正常启动后jps可以看到NN,SNN,DN三个进程
  10. 3513 SecondaryNameNode
  11. 3341 DataNode
  12. [hadoop@hadoop01 ~]$

同时,可以通过浏览器的50070端口访问hdfs的Web UI:http://192.168.1.8:50070/

70

至此,用hadoop用户启动hdfs完成。

2.6 Yarn伪分布式部署

配置mapred-site.xml

  1. [hadoop@hadoop01 hadoop-2.8.1]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
  2. [hadoop@hadoop01 hadoop-2.8.1]$ vi etc/hadoop/mapred-site.xml
  3. # 写入配置
  4. <configuration>
  5. <property>
  6. <name>mapreduce.framework.name</name>
  7. <value>yarn</value>
  8. </property>
  9. </configuration>

配置yarn-site.xml

  1. [hadoop@hadoop01 hadoop-2.8.1]$ vi etc/hadoop/yarn-site.xml
  2. # 写入配置
  3. <configuration>
  4. <property>
  5. <name>yarn.nodemanager.aux-services</name>
  6. <value>mapreduce_shuffle</value>
  7. </property>
  8. </configuration>

启动Yarn

  1. [hadoop@hadoop01 hadoop-2.8.1]$ sbin/start-yarn.sh
  2. starting yarn daemons
  3. starting resourcemanager, logging to /opt/software/hadoop-2.8.1/logs/yarn-hadoop-resourcemanager-hadoop01.out
  4. 192.168.1.8: starting nodemanager, logging to /opt/software/hadoop-2.8.1/logs/yarn-hadoop-nodemanager-hadoop01.out
  5. [hadoop@hadoop01 hadoop-2.8.1]$ jps
  6. 4371 Jps
  7. 3206 NameNode
  8. 4071 NodeManager
  9. 3513 SecondaryNameNode
  10. 3341 DataNode
  11. 3967 ResourceManager # 可以看到yarn启动后多了两个进程:ResourceManager和NodeManager
  12. [hadoop@hadoop01 hadoop-2.8.1]$

同样可以通过Yarn的Web UI页面查看Yarn服务:http://192.168.1.8:8088/

70 1

2.7 运行一个MapReduce job

MapReduce本身并不启动任何java进程,只有向hdfs提交jar任务时才产生任务进程。

启动一个MapReduce任务

  1. [hadoop@hadoop01 hadoop-2.8.1]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar pi 5 20
  2. Number of Maps = 5
  3. Samples per Map = 20
  4. Wrote input for Map #0
  5. Wrote input for Map #1
  6. Wrote input for Map #2
  7. Wrote input for Map #3
  8. Wrote input for Map #4
  9. Starting Job
  10. 18/05/22 09:40:31 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  11. 18/05/22 09:40:32 INFO input.FileInputFormat: Total input files to process : 5
  12. 18/05/22 09:40:32 INFO mapreduce.JobSubmitter: number of splits:5
  13. 18/05/22 09:40:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526996199577_0001
  14. 18/05/22 09:40:33 INFO impl.YarnClientImpl: Submitted application application_1526996199577_0001
  15. 18/05/22 09:40:33 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1526996199577_0001/
  16. 18/05/22 09:40:33 INFO mapreduce.Job: Running job: job_1526996199577_0001
  17. 18/05/22 09:40:41 INFO mapreduce.Job: Job job_1526996199577_0001 running in uber mode : false
  18. 18/05/22 09:40:41 INFO mapreduce.Job: map 0% reduce 0%
  19. 18/05/22 09:40:51 INFO mapreduce.Job: map 100% reduce 0%
  20. 18/05/22 09:40:57 INFO mapreduce.Job: map 100% reduce 100%
  21. 18/05/22 09:40:58 INFO mapreduce.Job: Job job_1526996199577_0001 completed successfully
  22. 18/05/22 09:40:58 INFO mapreduce.Job: Counters: 49
  23. File System Counters
  24. FILE: Number of bytes read=116
  25. FILE: Number of bytes written=819909
  26. FILE: Number of read operations=0
  27. FILE: Number of large read operations=0
  28. FILE: Number of write operations=0
  29. HDFS: Number of bytes read=1345
  30. HDFS: Number of bytes written=215
  31. HDFS: Number of read operations=23
  32. HDFS: Number of large read operations=0
  33. HDFS: Number of write operations=3
  34. Job Counters
  35. Launched map tasks=5
  36. Launched reduce tasks=1
  37. Data-local map tasks=5
  38. Total time spent by all maps in occupied slots (ms)=34336
  39. Total time spent by all reduces in occupied slots (ms)=3669
  40. Total time spent by all map tasks (ms)=34336
  41. Total time spent by all reduce tasks (ms)=3669
  42. Total vcore-milliseconds taken by all map tasks=34336
  43. Total vcore-milliseconds taken by all reduce tasks=3669
  44. Total megabyte-milliseconds taken by all map tasks=35160064
  45. Total megabyte-milliseconds taken by all reduce tasks=3757056
  46. Map-Reduce Framework
  47. Map input records=5
  48. Map output records=10
  49. Map output bytes=90
  50. Map output materialized bytes=140
  51. Input split bytes=755
  52. Combine input records=0
  53. Combine output records=0
  54. Reduce input groups=2
  55. Reduce shuffle bytes=140
  56. Reduce input records=10
  57. Reduce output records=0
  58. Spilled Records=20
  59. Shuffled Maps =5
  60. Failed Shuffles=0
  61. Merged Map outputs=5
  62. GC time elapsed (ms)=1073
  63. CPU time spent (ms)=3480
  64. Physical memory (bytes) snapshot=1536045056
  65. Virtual memory (bytes) snapshot=12665720832
  66. Total committed heap usage (bytes)=1157627904
  67. Shuffle Errors
  68. BAD_ID=0
  69. CONNECTION=0
  70. IO_ERROR=0
  71. WRONG_LENGTH=0
  72. WRONG_MAP=0
  73. WRONG_REDUCE=0
  74. File Input Format Counters
  75. Bytes Read=590
  76. File Output Format Counters
  77. Bytes Written=97
  78. Job Finished in 26.73 seconds
  79. Estimated value of Pi is 3.20000000000000000000
  80. [hadoop@hadoop01 hadoop-2.8.1]$

在Yarn的页面上可以看到对应的maoreduce作业执行状况

70 2

至此,整个Hadoop伪分布式环境部署完成。

注意事项:

  1. 单机模式中,Hadoop是没有进程的,只有用户在提交作业运行时才会产生进程;
  2. 伪分布式模式中,每个Hadoop进程运行在一个独立的java进程;
  3. 学习时,用最大权限用户root,避免造成不必要的问题;
  4. 单机/集群要对外提供服务时,需要配置的localhost为IP地址;
  5. 重新部署HDFS时,要先删除/tmp/下产生的临时文件,再格式化hdfs,再启动;
  6. Web UI页面远程无法访问时,检查防火墙;
  7. 核心配置文件去官方文档查看。

发表评论

表情:
评论列表 (有 0 条评论,299人围观)

还没有评论,来说两句吧...

相关阅读

    相关 Hadoop部署分布式

    伪分布式模式也是只需要一台机器,但是与本地模式的不同,伪分布式使用的是分布式的思想,具有完整的分布式文件存储和分布式计算的思想。只不过在进行存储和计算的时候涉及到的相关的守护进

    相关 Centos7分布式部署Hadoop

    前期准备 本教程基于vmware中创建的Centos7虚拟机环境进行教学。vmware中创建虚拟机与安装Centos7系统的步骤这里就不再赘述了,直接从系统安装完成后进行