Hadoop:HA搭建
准备工作
1.把集群全部停掉
在三台节点上都做(通过右键——> 发送输入到—>所有会话)=====================
2.在/opt下创建ha目录
sudo mkdir /opt/ha
3.设置所属主和所属组
sudo chown atguigu:atguigu ha
====================改回来 :通过右键——>发送输入到——当前会话
=在hadoop102
4.将原hadoop拷贝到/opt/ha下(在hadoop102上即可–后面会分发)
cp -r /opt/module/hadoop-3.1.3 /opt/ha
5.将/opt/ha下的hadoop里面的data logs 和/tmp/* 全部删除
(在hadoop102上即可–因为hadoop103和hadoop104没有分发呢)
rm -rf data logs
sudo rm -rf /tmp/*
6.配置环境变量(将指向/opt/module/hadoop-3.1.3 修改成指向/opt/ha/hadoop-3.1.3)
sudo vim /etc/profile.d/my_env.sh
修改
#定义HADOOP_HOME变量
export HADOOP_HOME=/opt/ha/hadoop-3.1.3
7.分发环境变量
sudo scp -r /etc/profile.d/my_env.sh hadoop103:/etc/profile.d/my_env.sh
sudo scp -r /etc/profile.d/my_env.sh hadoop104:/etc/profile.d/my_env.sh
8.让hadoop102,hadoop103,hadoop104环境变量生效
全部断开连接重新连。
每台节点一定要验证 :echo $HADOOP_HOME
9.分发
xsync /opt/ha/hadoop-3.1.3
NN-HA配置
1.准备工作做好
2.配置core-site.xml
fs.defaultFS
hdfs://mycluster
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/ha/hadoop-3.1.3/data</value>
</property>
2.配置hdfs-site.xml
<!-- NameNode数据存储目录 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.tmp.dir}/name</value>
</property>
<!-- DataNode数据存储目录 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.tmp.dir}/data</value>
</property>
<!-- JournalNode数据存储目录 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>${hadoop.tmp.dir}/jn</value>
</property>
<!-- 完全分布式集群名称 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- 集群中NameNode节点都有哪些 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2,nn3</value>
</property>
<!-- NameNode的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop102:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop103:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3</name>
<value>hadoop104:8020</value>
</property>
<!-- NameNode的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop103:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn3</name>
<value>hadoop104:9870</value>
</property>
<!-- 指定NameNode元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
</property>
<!-- 访问代理类:client用于确定哪个NameNode为Active -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh秘钥登录-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/atguigu/.ssh/id_rsa</value>
</property>
分发=======
3.在(hadoop102,hadoop103,hadoop104)启动journalnode
hdfs —daemon start journalnode
4.在[hadoop102]上,对其进行格式化,并启动NameNode
hdfs namenode -format
hdfs —daemon start namenode
5.在hadoop103和hadoop104上同步元数据并启动NameNode
hdfs namenode -bootstrapStandby
hdfs —daemon start namenode
6.启动三台节点上的datanode
hdfs —daemon start datanode
手动故障转移=============================
7.将[nn1–hadoop102]切换为Active
hdfs haadmin -transitionToActive nn1
8.查看是否Active
hdfs haadmin -getServiceState nn1
自动故障转移===============================
1.配置文件 注意!!!!!如果只是玩手动故障转移不要配置下面的内容
(1)在hdfs-site.xml中增加
dfs.ha.automatic-failover.enabled
true
(2)在core-site.xml文件中增加
ha.zookeeper.quorum
hadoop102:2181,hadoop103:2181,hadoop104:2181
分发=======
(1)关闭所有HDFS服务:
stop-dfs.sh
(2)启动Zookeeper集群:
zkCluster.sh start
(3)初始化HA在Zookeeper中状态:
hdfs zkfc -formatZK
(4)启动HDFS服务:
start-dfs.sh
(5)验证
将active的那台namenode给停掉查看其它namenode有没有变成active
hdfs —daemon stop namenode
13074 JournalNode
12275 QuorumPeerMain
13300 DFSZKFailoverController
12700 NameNode
12829 DataNode
RM-HA配置
1.在NN的HA配置好的前提下
2.配置yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
<!-- 启用resourcemanager ha -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 声明两台resourcemanager的地址 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
</property>
<!--指定resourcemanager的逻辑列表-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2,rm3</value>
</property>
<!-- 指定rm1的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop102</value>
</property>
<!-- 指定rm1的web端地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop102:8088</value>
</property>
<!-- 指定rm1的内部通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop102:8032</value>
</property>
<!-- 指定AM向rm1申请资源的地址 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop102:8030</value>
</property>
<!-- 指定供NM连接的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop102:8031</value>
</property>
<!-- 指定rm2的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop103:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop103:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop103:8030</value>
</property>
<property>
yarn.resourcemanager.resource-tracker.address.rm2
hadoop103:8031
<!-- 指定rm1的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>hadoop104</value>
</property>
<!-- 指定rm1的web端地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm3</name>
<value>hadoop104:8088</value>
</property>
<!-- 指定rm1的内部通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm3</name>
<value>hadoop104:8032</value>
</property>
<!-- 指定AM向rm1申请资源的地址 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm3</name>
<value>hadoop104:8030</value>
</property>
<!-- 指定供NM连接的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm3</name>
<value>hadoop104:8031</value>
</property>
<!-- 指定zookeeper集群的地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
<!-- 启用自动恢复 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 指定resourcemanager的状态信息存储在zookeeper集群 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
3.分发文件
4.在hadoop102中执行:
start-yarn.sh
5.查看服务状态
yarn rmadmin -getServiceState rm1
yarn --daemon stop resourcemanager
还没有评论,来说两句吧...