Moving cluster Ceph to Bullseye – The upgrade story
I wanted to try out the newly released Debian version of Bullseye for my Ceph cluster, so I updated one of my hosts, which, in hindsight, might have been a bit too early.
Reseting a Ceph Host after OS crash or failed update
Preparing cluster
First you need to set the cluster into a noout and norebalance mode in order to ensure that no data is moved around during the process. This is not super crucial but you could loose a lot of time moving data back and forth so it's a good practice.
ceph osd set noout
ceph osd set norebalance
Reinstall
Reinstalling was quite easy but required me to move the computer, open it up and deattach the OSD drive, plugin a USB with the operating system and reinstalling with a fresh system.
Update and upgrade.
After a new install I usually upgrade all packages to the latest version, and if I install an older version I will do an upgrade to the appropriate system so I'm in a good state before applying other software.
apt update
apt upgrade
Adding Ceph software
Adding Ceph software is as easy as adding the keys for Ceph, adding the pacific package location for buster. Then update and install the ceph and ceph-common packages.
wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
echo deb https://download.ceph.com/debian-pacific/ buster main | sudo tee /etc/apt/sources.list.d/ceph.list
apt update
apt install ceph ceph-common
Adding Smartmontools
We also want to add the latest smartmontools from bullseye so we add that using a backport.
echo deb http://deb.debian.org/debian buster-backports main | sudo tee /etc/apt/sources.list
vi /etc/apt/sources.list
apt update
apt install smartmontools/buster-backports
Reboot before configuration
Here is a good spot to do a reboot of the machine to ensure that all packages and libraries are loaded correctly for kernel access.
shutdown -r now
Configuring MON
Next up I configured the cluster by opening the ceph.conf
and ceph.client.admin.keyring
to add the same information that I have on other cluster members. There is a couple of values in the configuration file that is specific for each host so I updated the values for those but that was a minor effort.
cd /etc/ceph/
vi ceph.conf
vi ceph.client.admin.keyring
Next up we setup the monitor fetching keys and map from the cluster and then just creating the local resources needed to run it.
mkdir /var/lib/ceph/mon/ceph-node5
ceph auth get mon. -o /tmp/monkey
ceph mon getmap -o /tmp/monmap
ceph-mon -i node5 --mkfs --monmap /tmp/monmap --keyring /tmp/monkey
chown ceph:ceph -R /var/lib/ceph/mon/
Starting the service is also pretty straight forward, we need to remember to enable the service so it will start on next reboot.
systemctl status ceph-mon@node5.service
systemctl start ceph-mon@node5.service
systemctl status ceph-mon@node5.service
systemctl enable ceph-mon@node5.service
Configuring MGR
We also want to install the manager interface, this is just a graphical interface so adding the keys to the configuration directory is enough.
mkdir /var/lib/ceph/mgr/ceph-node5
ceph auth get-or-create mgr.node5 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-node5/keyring
chown ceph:ceph -R /var/lib/ceph/mgr/
Again we need to start the service and ensure to enable it in order to start when the computer is rebooted.
systemctl status ceph-mgr@node5.service
systemctl start ceph-mgr@node5.service
systemctl status ceph-mgr@node5.service
systemctl enable ceph-mgr@node5.service
Configuring MDS
Last but not least we have the MDS service which works only in memory and uses the cluster to store information so it also require a key to get up and running and then we can start it as usual.
mkdir /var/lib/ceph/mds/ceph-node5
ceph auth get-or-create mds.node5 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node5/keyring
chown ceph:ceph -R /var/lib/ceph/mds
Again we need to start the service and ensure to enable it in order to start when the computer is rebooted.
systemctl status ceph-mds@node5.service
systemctl start ceph-mds@node5.service
systemctl status ceph-mds@node5.service
systemctl enable ceph-mds@node5.service
Configuring OSD
The last part installing a OSD is a little bit more involved if you do it from scratch but getting an old host up and running was fairly easy. Creating config directory and adding the keyring is standard practice at this point.
mkdir /var/lib/ceph/osd/ceph-3
ceph auth get osd.3 > /var/lib/ceph/osd/ceph-3/keyring
chown -R ceph:ceph /var/lib/ceph/osd/
Next up I listed all the lvm devices on the host and realized that I already had the correct drive configured and ready so I only had to activate it and then enable the service to ensure it starting again after a reboot.
ceph-volume lvm list
ceph-volume lvm activate --all
systemctl status ceph-osd@3.service
systemctl enable ceph-osd@3.service
Last step
When all is said and done it worked and I could reset the status flags of my cluster so it could return to normal operation.
ceph osd unset noout
ceph osd unset norebalance