April | 2016 | BOFH meditations

Netdata

April 26, 2016

0 Comments

Netdata Dashboard

Lately, a friend advised me to give a look to Netdata. On paper, a lightweight dashboard, written in C, providing with realtime performances monitoring of Linux servers. Demo available here and there.

Having read the repository readme and installation instructions, the next thing I did to familiarize myself with Netdata, is to review the currently opened issues and pull requests. I can see FreeBSD is being investigated upon, packages are on their way, security concerns (such as Netdata default listen address) are being addressed, … Netdata may not be completely ready, yet it’s safe enough to be run pretty much everywhere, and definitely worth giving it a shot.

The setup process is pretty easy. You’ll pull a few dependencies – the kind you may not want on a production server though. A shell script will then build and install everything. A debian package skeleton is already there, if you prefer distributing binaries. Note that so far, you will have to register netdata service yourself, although their repository already includesboth systemd and initd samples.
As of right now, Netdata default behavior would be to listen on all your interfaces.

Netdata Author profile on Github

One last detail that caught my eye is that issue, enlightening the kind of guy we’re dealing with.
And I should probably also mention his GitHub profile, with the most amazing stats you’ll ever see, showing the kind of activity there’s been around Netdata lately.

One thing Netdata does not intend to provide is everlasting histories, nor even metrics aggregation. A couple issues tell about collectd or statsd potential integrations. Although Netdata intends to perform with very little overhead (contrarily to tools such as Collectd, which may increase your disk IOs) while displaying instantaneous values as and when they are read (contrarily to tools such as Munin, which won’t pull metrics faster than one dot per minute, and may need additional time generating its graphs).

Netdata probably won’t replace any of your existing service watching over your system metrics, yet it is indubitably powerful, pretty well written, while offering an exhaustive view over your system.

RiakCS

April 15, 2016

0 Comments

Samuel MARTIN MORO

In the last few months, I’ve been working with some application using Riak as its main database.
Riak (being renamed as RiakKV) is a Key/Value NoSGBD, running on Debian Wheezy, Ubuntu Trusty, CentOS 7, FreeBSD 10, Solaris 10, SmartOS 13.1 and Fedora 17. Probably not the most famous of them, yet pretty powerful and pretty easy to manage and scale.

The main culprits I have to mention here, is that the opensource version is limited in terms of features, and won’t allow you to setup replication across several clusters. An other one is that data sharding is not necessarily guaranteed. You should also note that there’s some poor SEO on their documentation, which ends up in google searches pointing to broken links half of the time. Finally: lack of packaging for recent distributions, and according to their pro support: Jessie and Xenial package won’t show up before their 2.3, which is probably not for this year. Sure, it looks bad, and yet, it does the job.

Recommended setup involves 5 instances, according to Basho. Using default settings, you will always keep 3 copies of each data. You may theoretically lose two instances, before meeting with inconsistencies – which should resolve themselves, recovering your nodes. Consider that depending on the way you’ll use Riak, you could either end up with good or catastrophic consistency.

Anyway, that’s not our topic today. Now that we introduced Riak, I recently discovered that I could build some s3 capable service on top of it.
Building a cluster, you will install RiakKV to all your nodes. Your blobs will be stored in there.
RiakCS will also be installed to all your nodes. These are your s3 gateways. All your storage node are also gateways. You should consider setting up some haproxy forwarding http or https traffic to your RiakCS daemon.
Finally, you will install Stanchion (manager ensuring uniqueness of entities) to a single instance. Actually, you could install it to several instances, although you should have all your RiakCS services coordinating with a single Stanchion at a time.
Optionally, you could also install RiakCS Control, a user management interface.
All of these are OpenSource. You could still use a RiakKV license, synchronizing your dataset to a remote location, although this may not be necessary.
Basho documentation tells about RiakCS differences with Swift and Atmos, although you could also compare it to Sheepdog, Ceph used with Rados Gateways, or just AWS’s s3. On top of architectural specifics, you should also consider that each of these solution have its own limited implementation of the s3 API.

Having run a Ceph (infernalis) +Rados-gw cluster alongside a RiakCS cluster as EC2’s autoscale groups, I noted that Ceph OSD tend to crash. Running from an ASG, they get replaced by a new instance, previous one is destroyed, I haven’t investigated much on the specifics – running Ceph on virtual systems is not recommended anyway. Whereas RiakCS never troubled me.

Short parenthesis regarding s3 implementations: running Ceph, I wasn’t able to create a bucket using s3cmd. I ended up writing a python script, using boto. Whereas running RiakCS, I couldn’t run anything until I updated my .s3cfg to explicitly enable v2_signature: AWS now uses v4, which is not implemented by RiakCS (and not documented anywhere).

I would probably still recommend Ceph to most users running on physical hardware. Although for anyone less likely to succeed in building and running his Ceph cluster, I would definitely recommend RiakCS, where adding, removing or replacing nodes is ridiculously easy, while recoveries are relatively well documented – and rarely required anyway.

BOFH meditations

Monthly archives "April"

Netdata

RiakCS