大数据入门介绍二

雨点打透心脏的1/2处 2024-02-19 17:25 118阅读 0赞

1.hdfs三个进程要以hadoop002启动

etc/hadoop 这里是版本的不同
conf
很多数大数据组建几乎是一层目录 以前的版本是没有的,现在是ect/hadoop

  1. [hadoop@hadoop002 hadoop]$ ll
  2. total 140
  3. -rw-r--r-- 1 hadoop hadoop 884 Feb 13 22:34 core-site.xml hadoop三个组建hdfs mapreduce yarn 核心的共有的东西拿出来
  4. -rw-r--r-- 1 hadoop hadoop 4294 Feb 13 22:30 hadoop-env.sh JDK目录和 hadoop家目录
  5. -rw-r--r-- 1 hadoop hadoop 867 Feb 13 22:34 hdfs-site.xml
  6. -rw-r--r-- 1 hadoop hadoop 11291 Mar 24 2016 log4j.properties
  7. -rw-r--r-- 1 hadoop hadoop 758 Mar 24 2016 mapred-site.xml.template
  8. -rw-r--r-- 1 hadoop hadoop 10 Mar 24 2016 slaves
  9. -rw-r--r-- 1 hadoop hadoop 690 Mar 24 2016 yarn-site.xml
  10. [hadoop@hadoop002 hadoop]$

hadoop: hdfs mapreduce yarn 有这3个组建
需要部署 不需要部署 需要部署

生产 学习: 不用ip部署,统一机器名称hostname部署
只需要/etc/hosts 修改即可(第一行 第二行不要删除)
在vi /etc/hosts 文件做好映射,
例如:172.31.236.240 hadoop002 以后再网段变更的时候,只要把前面改掉就可以了。
里面的文件第一行和第二行不要删掉,否则会出现问题

更改的是:namenode进程:

  1. [hadoop@hadoop002 hadoop]$ vi core-site.xml
  2. <configuration>
  3. <property>
  4. <name>fs.defaultFS</name>
  5. <value>hdfs://hadoop002:9000</value>
  6. </property>
  7. </configuration>

datanode进程:

  1. [hadoop@hadoop002 hadoop]$ vi slaves 这里是配置namenode的小弟
  2. hadoop002更改成机器名字,如果有很多用逗号分隔,
  3. secondarynamenode进程:
  4. <property>
  5. <name>dfs.namenode.secondary.http-address</name>
  6. <value>hadoop002:50090</value>
  7. </property>
  8. <property>
  9. <name>dfs.namenode.secondary.https-address</name>
  10. <value>hadoop002:50091</value>
  11. </property>

他的配置在哪里寻找官网
百度:cdh tar
搜索你的hadoop的版本带cdh的
在最下面找到配置文件

  1. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$
  2. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ cd
  3. [hadoop@hadoop002 ~]$
  4. [hadoop@hadoop002 ~]$
  5. [hadoop@hadoop002 ~]$
  6. [hadoop@hadoop002 ~]$ ll
  7. total 8
  8. drwxrwxr-x 3 hadoop hadoop 4096 Feb 13 22:21 app
  9. drwxrwxr-x 8 hadoop hadoop 4096 Feb 13 20:46 d5
  10. [hadoop@hadoop002 ~]$ ll -a
  11. total 68
  12. drwx------ 7 hadoop hadoop 4096 Feb 16 20:25 .
  13. drwxr-xr-x. 5 root root 4096 Oct 22 10:45 ..
  14. drwxrwxr-x 3 hadoop hadoop 4096 Feb 13 22:21 app
  15. -rw------- 1 hadoop hadoop 15167 Feb 13 23:29 .bash_history
  16. -rw-r--r-- 1 hadoop hadoop 18 Mar 23 2017 .bash_logout
  17. -rw-r--r-- 1 hadoop hadoop 293 Sep 19 23:22 .bash_profile
  18. -rw-r--r-- 1 hadoop hadoop 124 Mar 23 2017 .bashrc
  19. drwxrwxr-x 8 hadoop hadoop 4096 Feb 13 20:46 d5
  20. drwxrw---- 3 hadoop hadoop 4096 Sep 19 17:01 .pki
  21. drwx------ 2 hadoop hadoop 4096 Feb 13 22:39 .ssh
  22. drwxr-xr-x 2 hadoop hadoop 4096 Oct 14 20:57 .vim
  23. -rw------- 1 hadoop hadoop 8995 Feb 16 20:25 .viminfo
  24. [hadoop@hadoop002 ~]$ ll .ssh
  25. total 16
  26. -rw------- 1 hadoop hadoop 398 Feb 13 22:37 authorized_keys
  27. -rw------- 1 hadoop hadoop 1675 Feb 13 22:36 id_rsa
  28. -rw-r--r-- 1 hadoop hadoop 398 Feb 13 22:36 id_rsa.pub
  29. -rw-r--r-- 1 hadoop hadoop 780 Feb 13 22:49 known_hosts

重新构建ssh信任关系 之前的配置也是可以的,但是我们为了统一 重新再来一次配置

  1. [hadoop@hadoop002 ~]$ rm -rf .ssh
  2. [hadoop@hadoop002 ~]$ ssh-keygen
  3. Generating public/private rsa key pair.
  4. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  5. Created directory '/home/hadoop/.ssh'.
  6. Enter passphrase (empty for no passphrase):
  7. Enter same passphrase again:
  8. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  9. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  10. The key fingerprint is:
  11. ca:e4:a1:fc:9f:e2:86:e7:9c:ab:f2:19:7a:70:c5:3d hadoop@hadoop002
  12. The key's randomart image is:
  13. +--[ RSA 2048]----+
  14. | |
  15. | |
  16. | . . |
  17. | o E |
  18. | .o S. |
  19. | ...= o |
  20. | o+.+ |
  21. | ..o=+. . |
  22. | .++*B+o |
  23. +-----------------+
  24. [hadoop@hadoop002 ~]$ cd .ssh
  25. [hadoop@hadoop002 .ssh]$ ll
  26. total 8
  27. -rw------- 1 hadoop hadoop 1675 Feb 16 20:27 id_rsa
  28. -rw-r--r-- 1 hadoop hadoop 398 Feb 16 20:27 id_rsa.pub
  29. [hadoop@hadoop002 .ssh]$ cat id_rsa.pub >> authorized_keys
  30. [hadoop@hadoop002 .ssh]$
  31. [hadoop@hadoop002 .ssh]$ ll
  32. total 12
  33. -rw-rw-r-- 1 hadoop hadoop 398 Feb 16 20:27 authorized_keys
  34. -rw------- 1 hadoop hadoop 1675 Feb 16 20:27 id_rsa
  35. -rw-r--r-- 1 hadoop hadoop 398 Feb 16 20:27 id_rsa.pub
  36. [hadoop@hadoop002 .ssh]$
  37. [hadoop@hadoop002 .ssh]$
  38. [hadoop@hadoop002 .ssh]$
  39. [hadoop@hadoop002 .ssh]$
  40. [hadoop@hadoop002 .ssh]$ cd
  41. [hadoop@hadoop002 ~]$ cd app/hadoop-2.6.0-cdh5.7.0
  42. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ sbin/start-dfs.sh
  43. 19/02/16 20:28:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  44. Starting namenodes on [hadoop002]
  45. The authenticity of host 'hadoop002 (172.31.236.240)' can't be established.
  46. RSA key fingerprint is b1:94:33:ec:95:89:bf:06:3b:ef:30:2f:d7:8e:d2:4c.
  47. Are you sure you want to continue connecting (yes/no)? yes
  48. hadoop002: Warning: Permanently added 'hadoop002,172.31.236.240' (RSA) to the list of known hosts.
  49. hadoop@hadoop002's password:
  50. [2]+ Stopped sbin/start-dfs.sh
  51. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ ^C
  52. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ jps
  53. 962 Jps
  54. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ ps -ef|grep start-dfs.sh
  55. hadoop 790 349 0 20:25 pts/0 00:00:00 bash sbin/start-dfs.sh
  56. hadoop 887 349 0 20:28 pts/0 00:00:00 bash sbin/start-dfs.sh
  57. hadoop 977 349 0 20:28 pts/0 00:00:00 grep start-dfs.sh
  58. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$
  59. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ kill -9 790 887
  60. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ cd
  61. [1]- Killed sbin/start-dfs.sh (wd: ~/app/hadoop-2.6.0-cdh5.7.0)
  62. (wd now: ~)
  63. [2]+ Killed sbin/start-dfs.sh (wd: ~/app/hadoop-2.6.0-cdh5.7.0)
  64. (wd now: ~)
  65. [hadoop@hadoop002 ~]$ cd .ssh
  66. [hadoop@hadoop002 .ssh]$ ll
  67. total 16
  68. -rw-rw-r-- 1 hadoop hadoop 398 Feb 16 20:27 authorized_keys
  69. -rw------- 1 hadoop hadoop 1675 Feb 16 20:27 id_rsa
  70. -rw-r--r-- 1 hadoop hadoop 398 Feb 16 20:27 id_rsa.pub
  71. -rw-r--r-- 1 hadoop hadoop 406 Feb 16 20:28 known_hosts
  72. [hadoop@hadoop002 .ssh]$ chmod 600 authorized_keys
  73. [hadoop@hadoop002 .ssh]$
  74. [hadoop@hadoop002 .ssh]$ cd -
  75. /home/hadoop
  76. [hadoop@hadoop002 ~]$ cd app/hadoop-2.6.0-cdh5.7.0

启动hdfs 发现三个进程都以hadoop002启动

  1. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ sbin/start-dfs.sh
  2. 19/02/16 20:29:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Starting namenodes on [hadoop002]
  4. hadoop002: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop002.out
  5. hadoop002: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop002.out
  6. Starting secondary namenodes [hadoop002]
  7. hadoop002: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hadoop002.out
  8. 19/02/16 20:29:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  9. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

2.jps命令的真相

2.1位置哪里的

  1. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ which jps
  2. /usr/java/jdk1.8.0_45/bin/jps

2.2对应的进程的标识文件在哪 /tmp/hsperfdata_进程用户名称
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ cd /tmp
在ll
找到这个hsperfdata_hadoop

  1. [hadoop@hadoop002 hsperfdata_hadoop]$ pwd
  2. /tmp/hsperfdata_hadoop
  3. [hadoop@hadoop002 hsperfdata_hadoop]$ ll
  4. total 96
  5. -rw------- 1 hadoop hadoop 32768 Feb 16 20:35 1086
  6. -rw------- 1 hadoop hadoop 32768 Feb 16 20:35 1210
  7. -rw------- 1 hadoop hadoop 32768 Feb 16 20:35 1378
  8. [hadoop@hadoop002 hsperfdata_hadoop]$

2.3
root用户看所有用户的jps结果
普通用户只能看自己的当前用户jps进程,,

2.4 process information unavailable 信息不可用
真假判断: ps -ef|grep namenode 真正判断进程是否可用
出来有进程,就是真的,这里最好使用namenode 来代替1378

  1. [root@hadoop002 ~]# jps
  2. 1520 Jps
  3. 1378 -- process information unavailable
  4. 1210 -- process information unavailable
  5. 1086 -- process information unavailable
  6. [root@hadoop002 ~]#
  7. [root@hadoop002 ~]#
  8. [root@hadoop002 ~]#
  9. [root@hadoop002 ~]#

生产环境:
hadoop: hdfs组件 hdfs用户
root用户或sudo权限的用户取获取

kill: 人为 进程在Linux看来是耗内存最大的 自动给你kill

  1. [root@hadoop002 tmp]# rm -rf hsperfdata_hadoop
  2. [root@hadoop002 tmp]#
  3. [root@hadoop002 tmp]# jps
  4. 1906 Jps
  5. [root@hadoop002 tmp]#

3.pid文件 集群进程启动和停止要的文件

  1. -rw-rw-r-- 1 hadoop hadoop 5 Feb 16 20:56 hadoop-hadoop-datanode.pid
  2. -rw-rw-r-- 1 hadoop hadoop 5 Feb 16 20:56 hadoop-hadoop-namenode.pid
  3. -rw-rw-r-- 1 hadoop hadoop 5 Feb 16 20:57 hadoop-hadoop-secondarynamenode.pid

Linux在tmp命令 定期删除一些文件和文件夹 30天周期

解决:

  1. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ vi /etc/hadoop/hdaoop_env .sh
  2. # The directory where pid files are stored. /tmp by default.
  3. # NOTE: this should be set to a directory that can only be written to by
  4. # the user that will run the hadoop daemons. Otherwise there is the
  5. # potential for a symlink attack.
  6. export HADOOP_PID_DIR=${HADOOP_PID_DIR}
  7. mkdir /data/tmp
  8. chmod -R 777 /data/tmp
  9. export HADOOP_PID_DIR=/data/tmp

MapReduce: 计算的 是jar包提交的Yarn上 本身不需要部署
Yarn: 资源CPU和内存 和作业调度 是需要部署的

  1. MapReduce on Yarn
  2. Configure parameters as follows:
  3. etc/hadoop/mapred-site.xml:
  4. mapred-site.xml这个文件需要cp一份
  5. <configuration>
  6. <property>
  7. <name>mapreduce.framework.name</name>
  8. <value>yarn</value>
  9. </property>
  10. </configuration>
  11. 这个配置说明 MapReduce跑在yarn上面
  12. etc/hadoop/yarn-site.xml:
  13. <configuration>
  14. <property>
  15. <name>yarn.nodemanager.aux-services</name>
  16. <value>mapreduce_shuffle</value>
  17. </property>
  18. </configuration>

Start
ResourceManager daemon 老大 资源管理者
NodeManager daemon 小弟 节点管理者

$ sbin/start-yarn.sh 启动

  1. Browse the web interface for the ResourceManager; by default it is available at:
  2. ResourceManager - http://localhost:8088/
  3. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ jps
  4. 4001 NodeManager
  5. 3254 SecondaryNameNode
  6. 3910 ResourceManager
  7. 3563 NameNode
  8. 4317 Jps
  9. 3087 DataNode
  10. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$
  11. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

http://47.75.249.8:8088/ 打开yarn的网页浏览器

hadoop-hadoop-datanode-hadoop002.log 日志文件
hadoop-用户-进程名称-机器名称
hdfs

排查错误;
vi :/搜索 对于小的文件大小
tail -200f xxx.log 到看200行 这里只能小写f
另外窗口重启进程 为了再现这个错误

sz 上传到windows editplus去定位查看 备份
你的错误一定是最新的,直接拉到最下面就可以看到了,

5.运行mr mapreduce
map 映射
reduce 规约
map和reduce在运行的时候,map不是运行完,reduce在运行的,map是先运行,reduce也会运行的

[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hadoop jar ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 5 10

词频统计 例子:mapreduce 用它来算出单词出现的个数

  1. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ vi a.log 创建一个a文件
  2. ruoze
  3. jepson
  4. www.ruozedata.com
  5. dashu
  6. adai
  7. fanren
  8. 1
  9. a
  10. b
  11. c
  12. a b c ruoze jepon
  13. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ vi b.txt
  14. a b d e f ruoze
  15. 1 1 3 5
  16. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -mkdir /wordcount
  17. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -mkdir /wordcount/input 输入文件夹
  18. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -put a.log /wordcount/input 文件上传
  19. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -put b.txt /wordcount/input 文件上传
  20. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -ls /wordcount/input/
  21. Found 2 items
  22. -rw-r--r-- 1 hadoop supergroup 76 2019-02-16 21:59 /wordcount/input/a.log
  23. -rw-r--r-- 1 hadoop supergroup 24 2019-02-16 21:59 /wordcount/input/b.txt
  24. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$
  25. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hadoop jar \
  26. ./share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar \
  27. wordcount /wordcount/input /wordcount/output1 1代表第一次
  28. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -cat /wordcount/output1/part-r-00000
  29. -cat是查看
  30. 19/02/16 22:05:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  31. 1 3
  32. 3 1
  33. 5 1
  34. a 3
  35. adai 1
  36. b 3
  37. c 2
  38. d 1
  39. dashu 1
  40. e 1
  41. f 1
  42. fanren 1
  43. jepon 1
  44. jepson 1
  45. ruoze 3
  46. www.ruozedata.com 1
  47. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ hdfs dfs -get /wordcount/output1/part-r-00000 ./
  48. -get是下载
  49. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ cat part-r-00000
  50. 1 3
  51. 3 1
  52. 5 1
  53. a 3
  54. adai 1
  55. b 3
  56. c 2
  57. d 1
  58. dashu 1
  59. e 1
  60. f 1
  61. fanren 1
  62. jepon 1
  63. jepson 1
  64. ruoze 3
  65. www.ruozedata.com 1
  66. [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

发表评论

表情:
评论列表 (有 0 条评论,118人围观)

还没有评论,来说两句吧...

相关阅读