July | 2015 | BOFH meditations

OpenNebula

July 30, 2015

0 Comments

OpenNebula 4.10 dashboard, running on 4-compute 5-store cluster

This could have been the first article of this blog. OpenNebula is a modular cloud-oriented solution that could be compared to OpenStack, driving heterogeneous infrastructure, orchestrating storage, network and hypervisors configuration.

In the last 7 months, I’ve been using OpenNebula with Ceph to virtualize my main services, such as my mail server (200GB storage), my nntp index (200GB mysql DB, 300GB data), my wiki, plex, sabnzbd, … pretty much everything, except my DHCP, DNS, web cache and LDAP services.
A few before leaving Smile, I also used OpenNebula and Ceph to store our system logs, involving Elasticsearch, Kibana and rsyslog-om-elasticsearch (right: no logstash).

This week, some customer of mine was asking for a solution that would allow him to host several Cpanel VPS, knowing he already had a site dealing with customer accounts and billing. After refusing to use my scripts deploying Xen or KVM virtual machines, as well as some Proxmox-based setup, we ended up talking about OpenNebula.

OpenNebula 4.12 dashboard, running on a single SoYouStart host

The service is based on a single SoYouStart dedicated host, 32GB RAM, 2x2T disks and a few public IPs.
Sadly, OpenNebula is still not available for Debian Jessie. Trying to install Wheezy packages, I met with some dependency issues, regarding libxmlrpc. In the end, I reinstalled the server with the latest Wheezy.

From there, installing Sunstone, OpenNebula host utils, registering localhost to my compute nodes and my LVM to my datastores took a couple hours.
Then, I started installing centos7 using virt-install and vnc, building cpanel, installing csf, adding my scripts configuring network according to nebula context media, … the cloud was operational five hours after Wheezy was installed.
I finished by writing some training support (15 pages, mostly screenshots) explaining the few actions required to create a VM for a new customer, suspend his account, backup his disks, and eventually purge his resources.

OpenNebula VNC view

At first glance, using OpenNebula to drive virtualization services on a single host could seem overkill, to say the least.
Though having a customer that don’t want to know what a shell looks like, and when even Proxmox is not an acceptable answer, I feel confident OpenNebula could be way more useful than we give it credit for.

Crawlers

July 30, 2015

0 Comments

Samuel MARTIN MORO

Hosting public site, you’ve dealt with them already.
Using your favorite search engine, you’re indirectly subject to their work as well.
Crawlers are bots, querying and sometimes mapping your site, completing some search engine database.

When I started looking at the subject, a few years back, you only had to know about /robots.txt, to potentially prevent your site from being indexed, or at least restrict such accesses to relevant contents of your site.
More recently, we’ve seen the introduction of some XML files such as sitemap, allowing to efficiently serve to search engines a map of your site.
This year, Google reviewed his “rules” to prioritize responsive and platform-optimized sites as well. As such, they are now recommending to allow crawling for JavaScript and CSS files, warning –threatening– that preventing these accesses could result in your ranking being lowered.

At this point, indexing scripts and style-sheets, you might say – I’m surprised not to find the remark in the comments – that google actually indexes your site vulnerabilities, creating not only a directory of the known internet, but a complete map with everyones’ hidden pipes, that could some day be used to breach your site – if not already.

Even if Google is the major actor on that matter, you probably have dealt with Yahoo, Yandex.ru, MJ12, Voltron, … which practices are similar. Over the last years, checking your web server logs, you might have noticed a significant increase in the proportion of bot queries over human visits. In part due to search engines recrudescence, though I suspect mostly thanks to bot nets.
Identifying these crawlers could be done checking the UserAgent, sent with all http requests. On small traffic sites, crawlers may very well be your only clients.

Assuming your sites are subject to DDOS attacks, scans for some software vulnerability (top targeted solutions being wordpress, phpbb and phpmyadmin), you should know attackers will eventually masquerade their user-agent. Most likely branding themselves as Googlebot.

To guarantee a “googlebot” branded query actually comes out some google server, you just need to check the pointer record associated to this client’s IP. A way to do so in Apache (2.4) could be to use something like this (PoC to complete/experiment).
Still, maybe is it wiser to just drop all google requests as well. Without encouraging Tor usage, it’s probably time to switch to DuckDuckGo?

Otherwise, an other good way to deny these connections is described here, I may try to add something like this to my puppet classes.

Don’t trust the Tahr

July 30, 2015

0 Comments

Samuel MARTIN MORO

Beware that since latest Ubuntu kernel upgrades (14.04.02), you may lose network rebooting your servers!

I’ve had the problem four days ago, rebooting one of my OpenNebula hosts. Still unreachable after 5 minutes, I logged in physically, to see all my “p1pX” and “p4pX” interfaces had disappeared.
Checking udev rules, there is now a file fixing interfaces mapping. On a server I have not rebooted yet, this file doesn’t exist.

The story could have ended here. But with Ubuntu, updates is a daily struggle: today, one of my ceph OSD (hosting 4 disks) spontaneously stopped working.
Meaning: the host was still there, I was able to open a shell using SSH. Checking processes, all ceph osd deamon were stopped. Starting them showed no error, while processes were still absent. Checking dmesg, I had several lines of SSL-related segfaults.
As expected, rebooting fixed everything, from ceph, to my network interfaces names.
It’s in these days I most enjoy freelancing: I can address my system and network outages in time, way before it’s too late.

While I was starting to accept Ubuntu as safe enough to run production services, renaming interfaces on a production system is unacceptable. I’m curious to know how Canonical dealt with that providing BootStack and OpenStack-based services.

Note there is still a way to prevent your interfaces from being renamed:

# ln -s /dev/null /etc/udev/rules.d/75-persistent-net-generator.rules

OwnCloud & Pydio

July 25, 2015

0 Comments

Samuel MARTIN MORO

You may have heard of OwnCloud at least, if you’re not using one already. it with more than a couple users.

Thanks to a very fine web frontend, and several standalone clients allowing to to use your shares as a network file system, OwnCloud is user friendly, and could be trusted hosting hundreds of accounts, if not thousands.
The solution was installed in Smile, by Thibaut (59pilgrim). I didn’t mind that much, back then, I was using Pydio, and pretty satisfied already. We had around 700 users, not all being active, yet I could see the whole thing was pretty reliable.

Pydio is a good candidate to compare with OwnCloud. Both offer pretty much the same services. OwnCloud has lots of apps to do everything, Pydio has plugins. Both are PHP-based opensource projects, with fairly active communities.
Small advantage to OwnCloud though, with his native S3 connector. And arguably, a better linux client and web experience.

Recently, disappointed by Pydio – something about having \n in file names, preventing files from being uploaded to my Pydio server – I gave a shot to OwnCloud.
I haven’t lost hope in Pydio yet, but OwnCloud is definitely easier to deal with: I could even recommend it to a novice Linux enthusiast.

Asterisk

July 21, 2015

0 Comments

Samuel MARTIN MORO

Asterisk is an open-source framework for building communication-based applications.
Historically, Asterisk is an alternative to most proprietary Private Branch eXchange (PBX) systems, dealing with voice communications, conference calling IVR or voicemails.

Quite modular, Asterisk is shipped with several audio codecs (g711a, g711u, g722, gsm), handles standard protocols (SIP, IAX), and could be used virtually anywhere from multi-tenants providers, to end-user setups.

There’s a lot of Asterisk-based distributions, starting with FreePBX, and derivatives such as AsteriskNow, Elastix, or alternatives such as PBXinaflash.
The purpose of these systems is to provide end-users with a clear web interface managing their setup.
This is usually a good way to manage your setup. Although, when dealing with several servers, all with their local dialplans, configuring trunks, routes, user extensions, … and guaranteeing all your users are offered with the very same service, you will spend quite a lot of time doing repetitive checks, and sporadically fixing typos and unexpected configurations.

Before leaving Smile, I worked on a puppet class that could deploy asterisk and configure everything from hiera arrays. No frontend, except for some nginx distributing phone configurations. Minimalistic setup, based on Elastix/ASTDB-based generated contexts and embedded applications.
I didn’t have the time, nor the guts to finish it. Today, I have a working PoC, involving my Freephonie SIP account, a couples softwares and hardware phones, voicemails, DND, CFW, …
And last but not least: hardware phones default configuration locks them to a private context. Users may dial their extension number and authenticate themselves using a PIN number to get their phone re-configured with their extension.

Most of the work is publicly available on my gitlab.

Ceph Disk Failure

July 1, 2015

0 Comments

Samuel MARTIN MORO

Last week, one of my five Ceph hosts was unreachable.
Investigating, I noticed the OSD daemons were still running. Only daemons using the root file system, where either crashed (the local ceph MON daemon) or unable to process requests (SSH daemon was still answering, cleanly closing the connection).

After rebooting the system and looking at logs, I could see a lot of I/O errors. I left the console logged in to root, waiting for the next occurrence.
Having no spare 60GB SSD, I ordered one.

Two days later, the same problem occurred. From the console, I was unable to run anything (mostly segfaults and ENOENT).
Again, I was able to reboot. This time, I dropped a couple LVMs, unmounted the swap partition, and resized my VG to make sure I had a fair amount of unallocated space on my faulty disk.

The problem persisted, while average uptime was significantly getting lower.
I progressively disabled local OSSEC daemon, puppet, a few crontabs, collectd, munin, … only keeping ceph, nagios and ssh running. The problem kept happening, every 12 to 48 hours.

This morning, the server wasn’t even able to boot.
Checking the BIOS, my root SSD wasn’t detected.
Attaching it to some USB dock, I had to wait a couples minutes before the disk was actually detected by my laptop (Ubuntu 14.04.02), and my desktop (Debian 7.8).
I caught a break when receiving my new disk at 11 AM.
Running dd from the faulty disk to the new one took around 50 minutes (20MB/s, I can’t believe it!).
Syncing (1x512G SSD, 2*4T & 1*3T HDD) after 8 hours of downtime, took around half an hour. Knowing I run a fairly busy mail server, some nntp index, …), this is a new tangible improvement brought by Hammer, over Firefly.

I’m now preparing to send the faulty disk to my re-seller, for replacement. At least, I would have one handy, for the next failure.

Morality: cheap is unreasonable. Better be lucky.

BOFH meditations

Monthly archives "July"

OpenNebula

Crawlers

Don’t trust the Tahr

OwnCloud & Pydio

Asterisk

Ceph Disk Failure