CentOS 7搭建Apache Hadoop 3.1.1集群

ゝ一纸荒年。 2022-05-15 01:30 215阅读 0赞

一、集群规划

  1. 集群节点为1NameNode1SecondaryNameNode3DataNode,如下表如示:
  2. NameNodeSecondaryNameNode





















NameNode,SecondaryNameNode:(192.168.0.199)
组件 版本 路径
jdk 1.8.0_181 /usr/local/java/
hadoop 3.1.1 /usr/local/hadoop/
  1. DataNode





















DataNode:(192.168.0.111,192.168.0.133,192.168.0.155)
组件 版本 路径
jdk 1.8.0_181 /usr/local/java/
hadoop 3.1.1 /usr/local/hadoop/

二、安装包下载

  1. jdk:下载地址:[http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html][http_www.oracle.com_technetwork_java_javase_downloads_jdk8-downloads-2133151.html],下载1.8.0\_181版本;
  2. hadoop:下载地址:[http://hadoop.apache.org/releases.html][http_hadoop.apache.org_releases.html],下载3.1.1版本;

三、服务器设置

1、主机名修改

hostnamectl —static set-hostname cloud-ix
hostnamectl —static set-hostname cloud-vi
hostnamectl —static set-hostname cloud-vii
hostnamectl —static set-hostname cloud-viii

  1. 同时修改四个主机的/etc/hosts文件,增加以下三条:
  2. **192.168.0.111 cloud-vi
  3. 192.168.0.133 cloud-vii
  4. 192.168.0.155 cloud-viii
  5. 192.168.0.199 cloud-ix**

2、防火墙设置

  1. 如主机中未安装iptables,在四个主机中执行以下命令安装:
  2. **yum install iptables-services**
  3. 执行iptables -L -n -v命令可以查看iptables配置,执行以下命令永久关闭四个主机的iptables
  4. **chkconfig iptables off**
  5. 同时关闭四个主机的iptablesfirewalld并设置开机不启动,执行以下命令:

systemctl stop iptables
systemctl disable iptables
systemctl stop firewalld
systemctl disable firewalld

  1. 执行systemctl status iptablessystemctl status firewalld可以查看防火墙已经关闭。

3、时钟同步

  1. 执行以下命令安装ntpdate
  2. **yum install ntpdate**
  3. 执行以下命令同步时针:
  4. **ntpdate us.pool.ntp.org**
  5. 添加时针同步的定时任务,执行如下命令:
  6. **crontab -e**
  7. 输入如下内容,每天凌晨5点同步时针:
  8. **0 5 \* \* \* /usr/sbin/ntpdate cn.pool.ntp.org**
  9. 执行如下命令重启服务并设置开机自启:
  10. **service crond restart
  11. systemctl enable crond.service**

4、SSH免密登录

  1. 在四个主机中执行如下两条命令生成密钥和公钥文件,并添加到认证文件中:
  2. **ssh-keygen -t rsa
  3. cat id\_rsa.pub >> authorized\_keys**
  4. 合并四个主机的认证文件,在cloud-ix,cloud-vi,cloud-vii三个主机中分别执行如下命令,将四个主机的认证信息合并到cloud-viii中。
  5. **cloud-ixscp authorized\_keys root@cloud-vi:/root/.ssh/authorized\_keys
  6. cloud-viscp authorized\_keys root@cloud-vii:/root/.ssh/authorized\_keys
  7. cloud-viiscp authorized\_keys root@cloud-viii:/root/.ssh/authorized\_keys**
  8. cloud-viii中的认证文件,回写到其他三个主机中,执行如下命令:

scp authorized_keys root@cloud-ix:/root/.ssh/authorized_keys
scp authorized_keys root@cloud-vi:/root/.ssh/authorized_keys
scp authorized_keys root@cloud-vii:/root/.ssh/authorized_keys

  1. 在四个主机中,执行如下任意命令,可登录到其他主机:

ssh root@cloud-ix
ssh root@cloud-vi
ssh root@cloud-vii
ssh root@cloud-viii

  1. 登录到其他主机后,可执行exit返回主机。

四、安装并配置JDK

  1. 下载jdk1.8.0\_181,将其解压到/usr/local/java/。执行如下命令:
  2. **mkdir -p /usr/local/java
  3. tar -zxvf jdk-8u181-linux-x64.tar.gz /usr/local/java/**
  4. 添加环境变量,编辑配置文件/etc/profile

export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

  1. 配置后,执行source /etc/profile使配置生效
  2. 执行java -version查看jdk版本,看配置是否生效

五、安装并配置hadoop

  1. 下载hadoop 3.1.1,将其解压到/usr/local/hadoop。执行如下命令:

mkdir -p /usr/local/hadoop
tar -zxvf hadoop-3.1.1.tar.gz /usr/local/hadoop/

  1. 同时创建hadoop相关配置目录:
  2. **mkdir -p /data/hadoop/hdfs/name /data/hadoop/hdfs/data /var/log/hadoop/tmp**
  3. 添加环境变量,编辑配置文件/etc/profile

export HADOOP_HOME=/usr/local/hadoop/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  1. 配置后,执行source /etc/profile使配置生效
  2. 接下来配置hadoop配置文件:

1、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/hadoop-env.sh

配置jdk和hadoop根目录

  1. export JAVA_HOME=/usr/local/java/jdk1.8.0_181
  2. export HADOOP_HOME=/usr/local/hadoop/hadoop-3.1.1

2、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/yarn-env.sh

  1. 配置jdk根目录
  2. export JAVA_HOME=/usr/local/java/jdk1.8.0_181

3、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/workers

  1. 配置DataNode节点
  2. cloud-vi
  3. cloud-vii
  4. cloud-viii

4、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/core-site.xml

  1. 指定hdfsnameservice,指定hadoop临时目录
  2. <configuration>
  3. <property>
  4. <name>fs.defaultFS</name>
  5. <value>hdfs://cloud-ix:8020</value>
  6. </property>
  7. <property>
  8. <name>hadoop.tmp.dir</name>
  9. <value>/var/log/hadoop/tmp</value>
  10. </property>
  11. </configuration>

5、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/hdfs-site.xml

  1. 配置NameNodeSecondaryNameNode和文件目录,配置hdfs的文件副本数
  2. <configuration>
  3. <property>
  4. <name>dfs.namenode.http-address</name>
  5. <value>cloud-ix:50070</value>
  6. </property>
  7. <property>
  8. <name>dfs.namenode.secondary.http-address</name>
  9. <value>cloud-ix:50090</value>
  10. </property>
  11. <property>
  12. <name>dfs.namenode.name.dir</name>
  13. <value>/data/hadoop/hdfs/name</value>
  14. </property>
  15. <property>
  16. <name>dfs.datanode.data.dir</name>
  17. <value>/data/hadoop/hdfs/data</value>
  18. </property>
  19. <property>
  20. <name>dfs.replication</name>
  21. <value>3</value>
  22. </property>
  23. </configuration>

6、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/mapred-site.xml

  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5. </property>
  6. <property>
  7. <name>mapreduce.application.classpath</name>
  8. <value>
  9. $HADOOP_HOME/etc/hadoop,
  10. $HADOOP_HOME/share/hadoop/common/*,
  11. $HADOOP_HOME/share/hadoop/common/lib/*,
  12. $HADOOP_HOME/share/hadoop/hdfs/*,
  13. $HADOOP_HOME/share/hadoop/hdfs/lib/*,
  14. $HADOOP_HOME/share/hadoop/mapreduce/*,
  15. $HADOOP_HOME/share/hadoop/mapreduce/lib/*,
  16. $HADOOP_HOME/share/hadoop/yarn/*,
  17. $HADOOP_HOME/share/hadoop/yarn/lib/*
  18. </value>
  19. </property>
  20. <property>
  21. <name>mapreduce.jobhistory.address</name>
  22. <value>cloud-ix:10020</value>
  23. </property>
  24. <property>
  25. <name>mapreduce.jobhistory.webapp.address</name>
  26. <value>cloud-ix:19888</value>
  27. </property>
  28. </configuration>

7、/usr/local/hadoop/hadoop-3.1.1/etc/hadoop/yarn-site.xml

  1. <configuration>
  2. <property>
  3. <name>yarn.resourcemanager.hostname</name>
  4. <value>cloud-ix</value>
  5. </property>
  6. <property>
  7. <name>yarn.nodemanager.aux-services</name>
  8. <value>mapreduce_shuffle</value>
  9. </property>
  10. <property>
  11. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  12. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  13. </property>
  14. <property>
  15. <description>The address of the applications manager interface in the RM.</description>
  16. <name>yarn.resourcemanager.address</name>
  17. <value>${yarn.resourcemanager.hostname}:8032</value>
  18. </property>
  19. <property>
  20. <description>The address of the scheduler interface.</description>
  21. <name>yarn.resourcemanager.scheduler.address</name>
  22. <value>${yarn.resourcemanager.hostname}:8030</value>
  23. </property>
  24. <property>
  25. <description>The http address of the RM web application.</description>
  26. <name>yarn.resourcemanager.webapp.address</name>
  27. <value>${yarn.resourcemanager.hostname}:8088</value>
  28. </property>
  29. <property>
  30. <description>The https adddress of the RM web application.</description>
  31. <name>yarn.resourcemanager.webapp.https.address</name>
  32. <value>${yarn.resourcemanager.hostname}:8090</value>
  33. </property>
  34. <property>
  35. <name>yarn.resourcemanager.resource-tracker.address</name>
  36. <value>${yarn.resourcemanager.hostname}:8031</value>
  37. </property>
  38. <property>
  39. <description>The address of the RM admin interface.</description>
  40. <name>yarn.resourcemanager.admin.address</name>
  41. <value>${yarn.resourcemanager.hostname}:8033</value>
  42. </property>
  43. <property>
  44. <name>yarn.nodemanager.local-dirs</name>
  45. <value>/data/hadoop/yarn/local</value>
  46. </property>
  47. <property>
  48. <name>yarn.log-aggregation-enable</name>
  49. <value>true</value>
  50. </property>
  51. <property>
  52. <name>yarn.nodemanager.remote-app-log-dir</name>
  53. <value>/data/hadoop/data/tmp/logs</value>
  54. </property>
  55. <property>
  56. <name>yarn.log.server.url</name>
  57. <value>http://cloud-ix:19888/jobhistory/logs</value>
  58. </property>
  59. <property>
  60. <name>yarn.nodemanager.vmem-check-enabled</name>
  61. <value>false</value>
  62. </property>
  63. </configuration>
  64. 接下来修改hadoop启动脚本:

1、/usr/local/hadoop/hadoop-3.1.1/sbin/start-dfs.sh 和 /usr/local/hadoop/hadoop-3.1.1/sbin/stop-dfs.sh

  1. 添加如下配置:

HADOOP_SECURE_DN_USER=hdfs
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

2、/usr/local/hadoop/hadoop-3.1.1/sbin/start-yarn.sh 和 /usr/local/hadoop/hadoop-3.1.1/sbin/stop-yarn.sh

  1. 添加如下配置:

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

3、/usr/local/hadoop/hadoop-3.1.1/sbin/start-all.sh 和 /usr/local/hadoop/hadoop-3.1.1/sbin/stop-all.sh

  1. 添加如下配置:

TANODE_USER=root
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

六、启动hadoop

  1. 格式化hdfs,执行如下命令:

hdfs namenode -format

  1. 启动hadoop
  2. **start-all.sh**
  3. 执行jps可以查看节点中启动的NameNodeSecondaryNameNodeDataNode
  4. 执行stop-all.sh脚本可以停止hadoop
  5. 至此,hadoop集群已经搭建并启动。

七,验证hadoop

1、访问web页面

  1. 访问hadoop页面,打开[http://192.168.0.199:8088][http_192.168.0.199_8088],如下所示:

70

  1. 访问hdfs,打开[http://192.168.0.199:50070][http_192.168.0.199_50070],如下所示:

70 1

2、hdfs操作

  1. 命令行操作如下:
  2. 查看hdfs目录:**hadoop fs -ls /**
  3. 新建hdfs目录:**hadoop fs -mkdir -p /hdfs**
  4. 上传文件到hdfs:**hadoop fs -put temp.txt /hdfs**
  5. 重命名hdfs目录:**hadoop fs -mv /hdfs /dfs**
  6. 删除hdfs目录:**hadoop fs -rm /dfs**
  7. 查看hdfs文件内容:**hadoop fs -cat /hdfs/temp.txt**
  8. hdfs也可以在web端操作,打开页面[http://192.168.0.199:50070/explorer.html\#/][http_192.168.0.199_50070_explorer.html],即可看到hdfs。

3、执行hadoop自带的单词统计

  1. jar包在/usr/local/hadoop/hadoop-3.1.1/share/hadoop/mapreduce目录下,进入该目录,执行如下命令:

yarn jar hadoop-mapreduce-examples-3.1.1.jar wordcount /hdfs/temp.txt /hdfs/temp-wordcount-out

执行完后,结果写在/hdfs/temp-wordcount-out/part-r-00000文件中,查看该文件即可得到统计结果

发表评论

表情:
评论列表 (有 0 条评论,215人围观)

还没有评论,来说两句吧...

相关阅读