{"id":205,"date":"2015-08-24T16:55:58","date_gmt":"2015-08-24T14:55:58","guid":{"rendered":"https:\/\/blog.unetresgrossebite.com\/?p=205"},"modified":"2015-08-30T15:31:49","modified_gmt":"2015-08-30T13:31:49","slug":"scaling-out-with-ceph","status":"publish","type":"post","link":"https:\/\/blog.unetresgrossebite.com\/?p=205","title":{"rendered":"Scaling out with Ceph"},"content":{"rendered":"<p>A few months ago, I installed a Ceph cluster hosting disk images, for my OpenNebula cloud.<br \/>\nThis cluster is based on 5 ProLian N54L, each with a 60G SSD\u00a0for the main filesystems, some with 1 512G SSD OSD, all with 3 disk drives from 1 to 4T. SSD are grouped in a pool, HDD in an other.<\/p>\n<div id=\"attachment_206\" style=\"width: 310px\" class=\"wp-caption alignright\"><a href=\"https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12.png\"><img aria-describedby=\"caption-attachment-206\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-206 size-medium\" src=\"https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-300x138.png\" alt=\"OpenNebula Datastores View - before\" width=\"300\" height=\"138\" srcset=\"https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-300x138.png 300w, https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-1024x472.png 1024w, https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12.png 1164w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-206\" class=\"wp-caption-text\">OpenNebula Datastores View, having 5 Ceph OSD hosts<\/p><\/div>\n<p>Now that most my services are in this cluster, I&#8217;m left with very few free space.<br \/>\nThe good news is there is no significant impact on performances, as I was experiencing with ZFS.<br \/>\nThe bad news, is that I urgently need to add some storage space.<\/p>\n<p>Last Sunday, I ordered my sixth\u00a0N54L on eBay (from my &#8220;official&#8221; refurbish-er, <a href=\"http:\/\/www.bargainhardware.co.uk\/\">BargainHardware<\/a>) and a few disks.<br \/>\nAfter receiving everything, I installed the latest Ubuntu LTS (Trusty) from my PXE, installed puppet, prepared everything, &#8230; In about an hour, I was ready to add my disks.<\/p>\n<p>I use\u00a0a custom crush map, and the\u00a0osd &#8220;<em>crush update on start<\/em>&#8221; set to false, in my ceph.conf.<br \/>\nThis was the first time I tested this, and I was pleased to see I can run <em>ceph-deploy<\/em> to prepare my OSD, without automatically adding it to the default CRUSH root &#8211; especially having two pools.<br \/>\nFrom my <em>ceph-deploy<\/em> host (some Xen PV\u00a0I use hosting ceph-dash,\u00a0munin and nagios probes related to ceph, but with no OSD nor MON actually running), I ran the following:<\/p>\n<blockquote><p># ceph-deploy install erebe<br \/>\n# ceph-deploy disk list erebe<br \/>\n# ceph-deploy disk zap erebe:sda<br \/>\n# ceph-deploy disk zap erebe:sdb<br \/>\n# ceph-deploy disk zap erebe:sdc<br \/>\n# ceph-deploy disk zap erebe:sdd<br \/>\n# ceph-deploy osd prepare erebe:sda<br \/>\n# ceph-deploy osd prepare erebe:sda<br \/>\n# ceph-deploy osd prepare erebe:sdb<br \/>\n# ceph-deploy osd prepare erebe:sdc<br \/>\n# ceph-deploy osd prepare erebe:sdd<\/p><\/blockquote>\n<p>At that point, the 4 new OSD were up and running according to ceph status, though no data was assigned to them.<br \/>\nNext step was to update my crushmap, including these new OSDs in the proper root.<\/p>\n<blockquote><p># ceph osd getcrushmap -o compiled_crush<br \/>\n# crushtool -d compiled_crush -o plain_crush<br \/>\n# vi plain_crush<br \/>\n# crushtool -c plain_crush -o new-crush<br \/>\n# ceph osd setcrushmap -i new-crush<\/p><\/blockquote>\n<p>For the record, the content of my current crush map is the following:<br \/>\n<code># begin crush map<br \/>\ntunable choose_local_tries 0<br \/>\ntunable choose_local_fallback_tries 0<br \/>\ntunable choose_total_tries 50<br \/>\ntunable chooseleaf_descend_once 1<br \/>\ntunable straw_calc_version 1<br \/>\n# devices<br \/>\ndevice 0 osd.0<br \/>\ndevice 1 osd.1<br \/>\ndevice 2 osd.2<br \/>\ndevice 3 osd.3<br \/>\ndevice 4 osd.4<br \/>\ndevice 5 osd.5<br \/>\ndevice 6 osd.6<br \/>\ndevice 7 osd.7<br \/>\ndevice 8 osd.8<br \/>\ndevice 9 osd.9<br \/>\ndevice 10 osd.10<br \/>\ndevice 11 osd.11<br \/>\ndevice 12 osd.12<br \/>\ndevice 13 osd.13<br \/>\ndevice 14 osd.14<br \/>\ndevice 15 osd.15<br \/>\ndevice 16 osd.16<br \/>\ndevice 17 osd.17<br \/>\ndevice 18 osd.18<br \/>\ndevice 19 osd.19<br \/>\ndevice 20 osd.20<br \/>\ndevice 21 osd.21<br \/>\n# types<br \/>\ntype 0 osd<br \/>\ntype 1 host<br \/>\ntype 2 chassis<br \/>\ntype 3 rack<br \/>\ntype 4 row<br \/>\ntype 5 pdu<br \/>\ntype 6 pod<br \/>\ntype 7 room<br \/>\ntype 8 datacenter<br \/>\ntype 9 region<br \/>\ntype 10 root<br \/>\n# buckets<br \/>\nhost nyx-hdd {<br \/>\nid -2 # do not change unnecessarily<br \/>\n# weight 10.890<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.1 weight 3.630<br \/>\nitem osd.2 weight 3.630<br \/>\nitem osd.3 weight 3.630<br \/>\n}<br \/>\nhost eos-hdd {<br \/>\nid -3 # do not change unnecessarily<br \/>\n# weight 11.430<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.5 weight 3.630<br \/>\nitem osd.6 weight 3.900<br \/>\nitem osd.7 weight 3.900<br \/>\n}<br \/>\nhost hemara-hdd {<br \/>\nid -4 # do not change unnecessarily<br \/>\n# weight 9.980<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.9 weight 3.630<br \/>\nitem osd.10 weight 3.630<br \/>\nitem osd.11 weight 2.720<br \/>\n}<br \/>\nhost selene-hdd {<br \/>\nid -5 # do not change unnecessarily<br \/>\n# weight 5.430<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.12 weight 1.810<br \/>\nitem osd.13 weight 2.720<br \/>\nitem osd.14 weight 0.900<br \/>\n}<br \/>\nhost helios-hdd {<br \/>\nid -6 # do not change unnecessarily<br \/>\n# weight 3.050<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.15 weight 1.600<br \/>\nitem osd.16 weight 0.700<br \/>\nitem osd.17 weight 0.750<br \/>\n}<br \/>\nhost erebe-hdd {<br \/>\nid -7 # do not change unnecessarily<br \/>\n# weight 7.250<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.19 weight 2.720<br \/>\nitem osd.20 weight 1.810<br \/>\nitem osd.21 weight 2.720<br \/>\n}<br \/>\nroot hdd {<br \/>\nid -1 # do not change unnecessarily<br \/>\n# weight 40.780<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem nyx-hdd weight 10.890<br \/>\nitem eos-hdd weight 11.430<br \/>\nitem hemara-hdd weight 9.980<br \/>\nitem selene-hdd weight 5.430<br \/>\nitem helios-hdd weight 3.050<br \/>\nitem erebe-hdd weight 7.250<br \/>\n}<br \/>\nhost nyx-ssd {<br \/>\nid -42 # do not change unnecessarily<br \/>\n# weight 0.460<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.0 weight 0.460<br \/>\n}<br \/>\nhost eos-ssd {<br \/>\nid -43 # do not change unnecessarily<br \/>\n# weight 0.460<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.4 weight 0.460<br \/>\n}<br \/>\nhost hemara-ssd {<br \/>\nid -44 # do not change unnecessarily<br \/>\n# weight 0.450<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.8 weight 0.450<br \/>\n}<br \/>\nhost erebe-ssd {<br \/>\nid -45 # do not change unnecessarily<br \/>\n# weight 0.450<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem osd.18 weight 0.450<br \/>\n}<br \/>\nroot ssd {<br \/>\nid -41 # do not change unnecessarily<br \/>\n# weight 3.000<br \/>\nalg straw<br \/>\nhash 0 # rjenkins1<br \/>\nitem nyx-ssd weight 1.000<br \/>\nitem eos-ssd weight 1.000<br \/>\nitem hemara-ssd weight 1.000<br \/>\nitem erebe-ssd weight 1.000<br \/>\n}<br \/>\n# rules<br \/>\nrule hdd {<br \/>\nruleset 0<br \/>\ntype replicated<br \/>\nmin_size 1<br \/>\nmax_size 10<br \/>\nstep take hdd<br \/>\nstep chooseleaf firstn 0 type host<br \/>\nstep emit<br \/>\n}<br \/>\nrule ssd {<br \/>\nruleset 1<br \/>\ntype replicated<br \/>\nmin_size 1<br \/>\nmax_size 10<br \/>\nstep take ssd<br \/>\nstep chooseleaf firstn 0 type host<br \/>\nstep emit<br \/>\n}<br \/>\n# end crush map<\/code><\/p>\n<p>Applying the new crush map, a 20 hours process started, moving placement groups.<\/p>\n<div id=\"attachment_214\" style=\"width: 310px\" class=\"wp-caption alignright\"><a href=\"https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-erebe.png\"><img aria-describedby=\"caption-attachment-214\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-214 size-medium\" src=\"https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-erebe-300x137.png\" alt=\"OpenNebula Datastores View - after\" width=\"300\" height=\"137\" srcset=\"https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-erebe-300x137.png 300w, https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-erebe-1024x467.png 1024w, https:\/\/blog.unetresgrossebite.com\/wp-content\/uploads\/2015\/08\/nebula-store-4.12-erebe.png 1138w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-214\" class=\"wp-caption-text\">OpenNebula Datastores View, having 6 Ceph OSD hosts<\/p><\/div>\n<blockquote><p># ceph-diskspace<br \/>\n\/dev\/sdc1 3.7T 2.0T 1.7T 55% \/var\/lib\/ceph\/osd\/ceph-6<br \/>\n\/dev\/sda1 472G 330G 143G 70% \/var\/lib\/ceph\/osd\/ceph-4<br \/>\n\/dev\/sdb1 3.7T 2.4T 1.4T 64% \/var\/lib\/ceph\/osd\/ceph-5<br \/>\n\/dev\/sdd1 3.7T 2.4T 1.4T 65% \/var\/lib\/ceph\/osd\/ceph-7<br \/>\n\/dev\/sda1 442G 329G 114G 75% \/var\/lib\/ceph\/osd\/ceph-18<br \/>\n\/dev\/sdb1 2.8T 2.1T 668G 77% \/var\/lib\/ceph\/osd\/ceph-19<br \/>\n\/dev\/sdc1 1.9T 1.3T 593G 69% \/var\/lib\/ceph\/osd\/ceph-20<br \/>\n\/dev\/sdd1 2.8T 2.0T 808G 72% \/var\/lib\/ceph\/osd\/ceph-21<br \/>\n\/dev\/sdc1 927G 562G 365G 61% \/var\/lib\/ceph\/osd\/ceph-17<br \/>\n\/dev\/sdb1 927G 564G 363G 61% \/var\/lib\/ceph\/osd\/ceph-16<br \/>\n\/dev\/sda1 1.9T 1.2T 630G 67% \/var\/lib\/ceph\/osd\/ceph-15<br \/>\n\/dev\/sdb1 3.7T 2.8T 935G 75% \/var\/lib\/ceph\/osd\/ceph-9<br \/>\n\/dev\/sdd1 2.8T 1.4T 1.4T 50% \/var\/lib\/ceph\/osd\/ceph-11<br \/>\n\/dev\/sda1 461G 274G 187G 60% \/var\/lib\/ceph\/osd\/ceph-8<br \/>\n\/dev\/sdc1 3.7T 2.2T 1.5T 60% \/var\/lib\/ceph\/osd\/ceph-10<br \/>\n\/dev\/sdc1 3.7T 1.9T 1.8T 52% \/var\/lib\/ceph\/osd\/ceph-1<br \/>\n\/dev\/sde1 3.7T 2.0T 1.7T 54% \/var\/lib\/ceph\/osd\/ceph-3<br \/>\n\/dev\/sdd1 3.7T 2.3T 1.5T 62% \/var\/lib\/ceph\/osd\/ceph-2<br \/>\n\/dev\/sdb1 472G 308G 165G 66% \/var\/lib\/ceph\/osd\/ceph-0<br \/>\n\/dev\/sdb1 1.9T 1.2T 673G 64% \/var\/lib\/ceph\/osd\/ceph-12<br \/>\n\/dev\/sdd1 927G 580G 347G 63% \/var\/lib\/ceph\/osd\/ceph-14<br \/>\n\/dev\/sdc1 2.8T 2.0T 813G 71% \/var\/lib\/ceph\/osd\/ceph-13<\/p><\/blockquote>\n<p>I&#8217;m really satisfied by the way ceph is constantly improving their product.<br \/>\nHaving discussed with several interviewers in the last few weeks, I&#8217;m still having to explain why ceph rbd is not to be confused with cephfs, and if the latter may not be production ready, rados storage is just the thing you could be looking for distributing your storage.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few months ago, I installed a Ceph cluster hosting disk images, for my OpenNebula cloud. This cluster is based on 5 ProLian N54L, each with a 60G SSD\u00a0for the main filesystems, some with 1 512G SSD OSD, all with 3 disk drives from 1 to 4T. SSD are grouped in a pool, HDD in [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5,2],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=\/wp\/v2\/posts\/205"}],"collection":[{"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=205"}],"version-history":[{"count":11,"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=\/wp\/v2\/posts\/205\/revisions"}],"predecessor-version":[{"id":221,"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=\/wp\/v2\/posts\/205\/revisions\/221"}],"wp:attachment":[{"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=205"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=205"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.unetresgrossebite.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=205"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}