Ceph集群生产环境安装部署

我会带着你远行 2022-05-13 07:28 605阅读 0赞

前言

ceph的组件以及工作流程非常的复杂,是一个庞大的系统,在尝试ceph之前尽量多查阅官方的文档,理解ceph的mon/osd/mds/pg/pool等各组件/Unit的协同工作方式
Ceph官方文档

一、配置规划:

这里写图片描述

二、部署

1. ntp-server开启ntp服务:

  1. apt-get install ntp ntpdate ntp-doc
  2. systemctl enable ntp
  3. systemctl start ntp

2. ceph node

三台node全部执行如下操作:
磁盘分区规划如顶部表格,按照规划写的磁盘划分脚本,分别在3台node上执行脚本:

  1. # cat ~/parted.sh
  2. #!/bin/bash
  3. set -e
  4. if [ ! -x "/usr/sbin/parted" ]; then
  5. echo "This script requires /sbin/parted to run!" >&2
  6. exit 1
  7. fi
  8. DISKS="d e f g h i j k l m n o p"
  9. for i in ${DISKS}; do
  10. echo "Creating partitions on /dev/sd${i} ..."
  11. parted -a optimal --script /dev/sd${i} -- mktable gpt
  12. parted -a optimal --script /dev/sd${i} -- mkpart primary xfs 0% 100%
  13. sleep 1
  14. #echo "Formatting /dev/sd${i}1 ..."
  15. mkfs.xfs -f /dev/sd${i}1 &
  16. done
  17. SSDS="b c"
  18. for i in ${SSDS}; do
  19. parted -s /dev/sd${i} mklabel gpt
  20. parted -s /dev/sd${i} mkpart primary 0% 10%
  21. parted -s /dev/sd${i} mkpart primary 11% 20%
  22. parted -s /dev/sd${i} mkpart primary 21% 30%
  23. parted -s /dev/sd${i} mkpart primary 31% 40%
  24. parted -s /dev/sd${i} mkpart primary 41% 50%
  25. parted -s /dev/sd${i} mkpart primary 51% 60%
  26. parted -s /dev/sd${i} mkpart primary 61% 70%
  27. done
  28. chown -R ceph:ceph /dev/sdb[1-7]
  29. chown -R ceph:ceph /dev/sdc[1-7]

添加 /etc/hosts解析,并scp到3台node上:

  1. 192.168.20.112 h020112
  2. 192.168.20.113 h020113
  3. 192.168.20.114 h020114

关闭防火墙、selinux,添加定时同步时间计划任务:

  1. sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
  2. setenforce 0
  3. systemctl stop firewalld
  4. systemctl disable firewalld
  5. cat >>/etc/crontab<<EOF
  6. × */1 * * * root ntpdate 192.168.9.28 && --systohc
  7. EOF
  8. cd /etc/yum.repos.d
  9. mv CentOS-Base.repo CentOS-Base.repo.bak
  10. wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
  11. yum -y install ntp ntpdate
  12. yum -y install ceph

在node1(192.168.20.112)上执行:

  1. wget https://download.ceph.com/rpm-kraken/el7/noarch/ceph-deploy-1.5.37-0.noarch.rpm
  2. yum -y install ceph-deploy-1.5.37-0.noarch.rpm
  3. # 创建秘钥,传送到各节点,实现无秘钥登录
  4. ssh-keygen
  5. ssh-copy-id h020112
  6. ssh-copy-id h020113
  7. ssh-copy-id h020114
  8. # 新建集群,生成配置文件
  9. mkdir ceph-cluster && cd ceph-cluster
  10. ceph-deploy new h020112 h020113 h020114
  11. ## 修改默认生成的ceph.conf,增加如下配置段:
  12. # 80G日志盘
  13. osd_journal_size = 81920
  14. public_network= 192.168.20.0/24
  15. # 副本pg数为2,默认为3,最小工作size为默认size - (默认size/2)
  16. osd pool default size = 2
  17. # 官方建议平均每个osd 的pg数量不小于30,即pg num > (osd_num) * 30 / 2(副本数)
  18. osd pool default pg num = 1024
  19. osd pool default pgp num = 1024
  20. # 传送ceph.conf
  21. ceph-deploy --overwrite-conf config push h020112 h020113 h020114
  22. # 若有RuntimeError: bootstrap-mds keyring not found; run 'gatherkeys'报错,则执行如下命令传送key
  23. ceph-deploy gatherkeys yh020112 h020113 h020114
  24. # 初始化mon节点
  25. ceph-deploy mon create-initial
  26. ceph -s # 查看mon是否添加成功
  27. # ceph 集群加入OSD
  28. cd /etc/ceph # 一定要进入这个目录下
  29. # 执行如下脚本,日志盘和数据盘的对应关系要再三确认
  30. [root@h20112 ceph]# cat add_osd.sh
  31. #!/bin/bash
  32. for ip in $(cat ~/ceph-cluster/cephosd.txt)
  33. do
  34. echo ----$ip-----------;
  35. ceph-deploy --overwrite-conf osd prepare $ip:sdd1:/dev/sdb1 $ip:sde1:/dev/sdb2 $ip:sdf1:/dev/sdb3 $ip:sdg1:/dev/sdb4 $ip:sdh1:/dev/sdb5 $ip:sdi1:/dev/sdb6 \
  36. $ip:sdj1:/dev/sdc1 $ip:sdk1:/dev/sdc2 $ip:sdl1:/dev/sdc3 $ip:sdm1:/dev/sdc4 $ip:sdn1:/dev/sdc5 $ip:sdo1:/dev/sdc6 $ip:sdp1:/dev/sdc7
  37. done
  38. for ip in $(cat ~/ceph-cluster/cephosd.txt)
  39. do
  40. echo ----$ip-----------;
  41. ceph-deploy osd activate $ip:sdd1:/dev/sdb1 $ip:sde1:/dev/sdb2 $ip:sdf1:/dev/sdb3 $ip:sdg1:/dev/sdb4 $ip:sdh1:/dev/sdb5 $ip:sdi1:/dev/sdb6 \
  42. $ip:sdj1:/dev/sdc1 $ip:sdk1:/dev/sdc2 $ip:sdl1:/dev/sdc3 $ip:sdm1:/dev/sdc4 $ip:sdn1:/dev/sdc5 $ip:sdo1:/dev/sdc6 $ip:sdp1:/dev/sdc7
  43. done

查看结果:

  1. [root@h20112 ceph]# ceph osd tree
  2. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
  3. -1 34.04288 root default
  4. -2 11.34763 host h020112
  5. 0 0.87289 osd.0 up 1.00000 1.00000
  6. 3 0.87289 osd.3 up 1.00000 1.00000
  7. 4 0.87289 osd.4 up 1.00000 1.00000
  8. 5 0.87289 osd.5 up 1.00000 1.00000
  9. 6 0.87289 osd.6 up 1.00000 1.00000
  10. 7 0.87289 osd.7 up 1.00000 1.00000
  11. 8 0.87289 osd.8 up 1.00000 1.00000
  12. 9 0.87289 osd.9 up 1.00000 1.00000
  13. 10 0.87289 osd.10 up 1.00000 1.00000
  14. 11 0.87289 osd.11 up 1.00000 1.00000
  15. 12 0.87289 osd.12 up 1.00000 1.00000
  16. 13 0.87289 osd.13 up 1.00000 1.00000
  17. 14 0.87289 osd.14 up 1.00000 1.00000
  18. -3 11.34763 host h020113
  19. 1 0.87289 osd.1 up 1.00000 1.00000
  20. 15 0.87289 osd.15 up 1.00000 1.00000
  21. 16 0.87289 osd.16 up 1.00000 1.00000
  22. 17 0.87289 osd.17 up 1.00000 1.00000
  23. 18 0.87289 osd.18 up 1.00000 1.00000
  24. 19 0.87289 osd.19 up 1.00000 1.00000
  25. 20 0.87289 osd.20 up 1.00000 1.00000
  26. 21 0.87289 osd.21 up 1.00000 1.00000
  27. 22 0.87289 osd.22 up 1.00000 1.00000
  28. 23 0.87289 osd.23 up 1.00000 1.00000
  29. 24 0.87289 osd.24 up 1.00000 1.00000
  30. 25 0.87289 osd.25 up 1.00000 1.00000
  31. 26 0.87289 osd.26 up 1.00000 1.00000
  32. -4 11.34763 host h020114
  33. 2 0.87289 osd.2 up 1.00000 1.00000
  34. 27 0.87289 osd.27 up 1.00000 1.00000
  35. 28 0.87289 osd.28 up 1.00000 1.00000
  36. 29 0.87289 osd.29 up 1.00000 1.00000
  37. 30 0.87289 osd.30 up 1.00000 1.00000
  38. 31 0.87289 osd.31 up 1.00000 1.00000
  39. 32 0.87289 osd.32 up 1.00000 1.00000
  40. 33 0.87289 osd.33 up 1.00000 1.00000
  41. 34 0.87289 osd.34 up 1.00000 1.00000
  42. 35 0.87289 osd.35 up 1.00000 1.00000
  43. 36 0.87289 osd.36 up 1.00000 1.00000
  44. 37 0.87289 osd.37 up 1.00000 1.00000
  45. 38 0.87289 osd.38 up 1.00000 1.00000
  46. [root@h20112 ceph]# ceph -s
  47. cluster 6661d89d-5895-4bcb-9b11-9400638afc85
  48. health HEALTH_OK
  49. monmap e1: 3 mons at {h020112=192.168.20.112:6789/0,h020113=192.168.20.113:6789/0,h020114=192.168.20.114:6789/0}
  50. election epoch 6, quorum 0,1,2 h020112,h020113,h020114
  51. osdmap e199: 39 osds: 39 up, 39 in
  52. flags sortbitwise,require_jewel_osds
  53. pgmap v497: 1024 pgs, 1 pools, 0 bytes data, 0 objects
  54. 4385 MB used, 34854 GB / 34858 GB avail
  55. 1024 active+clean
  56. # 若配置文件里指定的pg_num 和 php_num未生效,使用命令指定:
  57. sudo ceph osd pool set rbd pg_num 1024
  58. sudo ceph osd pool set rbd pgp_num 1024

自定义crush分布式调度规则:

ceph一共有如下层级的管理单位,从上到下层级依次提升,可以灵活地按照物理的逻辑粒度,将osd关联到不同的主机、机位、机架、pdu、机房、区域等管理单位,每一层级的整体权重值等于该层级下所有OSD的权重之和。

  1. type 0 osd
  2. type 1 host
  3. type 2 chassis
  4. type 3 rack
  5. type 4 row
  6. type 5 pdu
  7. type 6 pod
  8. type 7 room
  9. type 8 datacenter
  10. type 9 region
  11. type 10 root

自定义crush视图:
这里只写操作方法,不作实施,本次操作的环境硬盘数目、硬盘规格、主机位置均一致,暂时不作crush调整

  1. ceph osd crush add-bucket hnc root #添加root层级名为hnc的bucket
  2. ceph osd crush add-bucket rack0 rack #添加rack层级名为rack0的bucket
  3. ceph osd crush add-bucket rack1 rack #添加rack层级名为rack1的bucket
  4. ceph osd crush add-bucket rack2 rack #添加rack层级名为rack2的bucket
  5. ceph osd crush move rack0 root=hnc #将rack0移入hnc之下
  6. ceph osd crush move rack0 root=hnc #将rack1移入hnc之下
  7. ceph osd crush move rack0 root=hnc #将rack2移入hnc之下
  8. ceph ods crush move h020112 rack=rack0 #将host h020112移入rack0层级下
  9. ceph ods crush move h020113 rack=rack1 #将host h020113移入rack1层级下
  10. ceph ods crush move h020114 rack=rack2 #将host h020114移入rack2层级下
  11. ceph osd getcrushmap -o map_old 导出map
  12. crushtool -d map_old -o map_old.txt 转化成可编辑格式
  13. crushtool -c map_new.txt -o map_new 还原为map
  14. ceph osd setcrushmap -i map_new map导入ceph

修改配置文件,防止ceph自动更新crushmap

  1. echo osd_crush_update_on_start = false >> /etc/ceph/ceph.conf
  2. /etc/init.d/ceph restart

使用如下样例crash(map_new.txt)配置:

  1. --------------------------------------
  2. # begin crush map
  3. tunable choose_local_tries 0
  4. tunable choose_local_fallback_tries 0
  5. tunable choose_total_tries 50
  6. tunable chooseleaf_descend_once 1
  7. tunable straw_calc_version 1
  8. # devices
  9. device 0 osd.0
  10. device 1 osd.1
  11. device 2 osd.2
  12. device 3 osd.3
  13. device 4 osd.4
  14. device 5 osd.5
  15. device 6 osd.6
  16. device 7 osd.7
  17. device 8 osd.8
  18. device 9 osd.9
  19. device 10 osd.10
  20. device 11 osd.11
  21. # types
  22. type 0 osd
  23. type 1 host
  24. type 2 chassis
  25. type 3 rack
  26. type 4 row
  27. type 5 pdu
  28. type 6 pod
  29. type 7 room
  30. type 8 datacenter
  31. type 9 region
  32. type 10 root
  33. # buckets
  34. root default {
  35. id -1 # do not change unnecessarily
  36. # weight 0.000
  37. alg straw
  38. hash 0 # rjenkins1
  39. }
  40. # 权重值一般根据磁盘容量(基数)与性能(倍率)调整,例如设置1T为1.00,2T为2.00,hdd倍数为1,ssd倍数为2
  41. host h020112 {
  42. id -2 # do not change unnecessarily
  43. # weight 0.000
  44. alg straw
  45. hash 0 # rjenkins1
  46. item osd.0 weight 2.000
  47. item osd.1 weight 2.000
  48. item osd.2 weight 2.000
  49. item osd.3 weight 2.000
  50. # 需写出全部osd,这里省略不写这么多了
  51. }
  52. host h020113 {
  53. id -3 # do not change unnecessarily
  54. # weight 0.000
  55. alg straw
  56. hash 0 # rjenkins1
  57. item osd.4 weight 2.000
  58. item osd.5 weight 2.000
  59. item osd.6 weight 2.000
  60. item osd.7 weight 2.000
  61. }
  62. host h020114 {
  63. id -4 # do not change unnecessarily
  64. # weight 0.000
  65. alg straw
  66. hash 0 # rjenkins1
  67. item osd.8 weight 2.000
  68. item osd.9 weight 2.000
  69. item osd.10 weight 2.000
  70. item osd.11 weight 2.000
  71. }
  72. rack rack0 {
  73. id -6 # do not change unnecessarily
  74. # weight 0.000
  75. alg straw
  76. hash 0 # rjenkins1
  77. item h020112 weight 8.000
  78. }
  79. rack rack1 {
  80. id -7 # do not change unnecessarily
  81. # weight 0.000
  82. alg straw
  83. hash 0 # rjenkins1
  84. item h020113 weight 8.000
  85. }
  86. rack rack2 {
  87. id -8 # do not change unnecessarily
  88. # weight 0.000
  89. alg straw
  90. hash 0 # rjenkins1
  91. item h020114 weight 8.000
  92. }
  93. root hnc {
  94. id -5 # do not change unnecessarily
  95. # weight 0.000
  96. alg straw
  97. hash 0 # rjenkins1
  98. item rack0 weight 8.000
  99. item rack1 weight 8.000
  100. item rack2 weight 8.000
  101. }
  102. # rules
  103. rule replicated_ruleset { #规则集的命名,创建pool时可以指定rule集
  104. ruleset 0 #rules集的编号,顺序编即可
  105. type replicated #定义pool类型为replicated(还有esurecode模式)
  106. min_size 1 #pool中最小指定的副本数量不能小1
  107. max_size 10 #pool中最大指定的副本数量不能大于10
  108. step take hnc #定义pg查找副本的入口点
  109. step chooseleaf firstn 0 type rack #选叶子节点、深度优先、隔离rack
  110. step emit
  111. }
  112. # end crush map

将修改后的crushmap编译并且注入集群中

  1. crushtool -c map_new.txt -o map_new
  2. ceph osd setcrushmap -i map_new
  3. ceph osd tree
  4. ceph osd crush rm default # 删除默认crush map
  5. systemctl stop ceph\*.service ceph\*.target #关闭所有服务
  6. systemctl start ceph.target #启动服务

发表评论

表情:
评论列表 (有 0 条评论,605人围观)

还没有评论,来说两句吧...

相关阅读

    相关 CEPH操作

    集群操作 高层运作 高级群集操作主要包括使用ceph服务启动,停止和重新启动群集。检查集群的运行状况;以及监视运行中的集群。 [操作集群][Link 1]