Alexandre IT Admin : 2017

Easy Graphite Install & Configuration

simple 2017 method (with usage of docker) <=> connect to Shinken + linux-ssh pack

We assume you have Shinken up ad running. And that now you want to use Graphite.
The method can help you have Graphite up and running with another backend too actually (just skip the Shinken/Nagios part)

Introduction (where the story is)

Using shinken for quite a few years now, I'm happy with it, with the interface (Shinken Thruk advantage compared to Nagios Thruk: we can select many services at once, and perform Ack / Recheck, .... very handful + have fail over, load balancing with no hassle + many small things)
By default, Shinken has RRD graphs activated. The problem comes when, new metric comes from a server, and the RRD files become all blank, because they need to be migrated to a format with the new metric, or with the new metric format.... result is, you loose all your previous graphs with all the history. I think it happened to everyone using RRD graphs at one point of a configuration change.
Solution to this may exist, but I haven't found something simple enough to spent reasonable amount of time doing it. I chose to use the "RRD next gen", called Graphite.

What people call Graphite is actually more a set of 3 things I'll describe shortly
-CARBON , a daemon who listens on port 2003, and then feeds the database
-WHISPER, a database composed of *.wsp files, 1 for each metric
-GRAPHITE, a web interface to show graphs stored in previous database. It's Python, Cairo, Django ... driven

For installing it, I found many websites, documentations and tutorials, all a bit different, and I was ending constantly with problems likes:

having a correct vHost for apache (+ not breaking installed sites + their modules)
having correct Cairo version installed bia PIP
Python version mismatch, expected '2.7.2+'
mod_wsgi.so: Cannot load mod_wsgi.so into server: libpython2.5.so.1.0: cannot open shared object file: No such file or directory
Target WSGI script cannot be loaded as Python module
graphite/carbon ImportError: No module named fields
Django: IntegrityError: column user_id is not unique
mod_wsgi fails when it is asked to read a file
...

Note: I was using Debian 6 when I first tried, then Debian 7, with still some problems.

So, ok, I'm no python expert, nor basic user, but I found the Debian packaging + PIP versioning + dependencies problem just a bit too much for me.

I chose to use a ready to use Carbon-Whisper-Graphite docker container. Simple and working well in seconds (ok, minutes for the first time you use Docker)

Installation (where the tech is)

on Debian 8.8, this works super well:

curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

apt-key fingerprint 0EBFCD88

apt-get install apt-transport-https ca-certificates curl python-software-properties

add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian \

$(lsb_release -cs) stable"

apt-get update && apt-get install docker-ce

Now, you want to download + install the graphite docker container, in just 1 command:

docker run -d\

> --name graphite\

> --restart=always\

> -p 81:80\

> -p 2003-2004:2003-2004\

> -p 2023-2024:2023-2024\

> -p 8125:8125/udp\

> -p 8126:8126\

> hopsoft/graphite-statsd

Unable to find image 'hopsoft/graphite-statsd:latest' locally

latest: Pulling from hopsoft/graphite-statsd

Command is in yellow.
Only thing I did, is asked port 80 of the container to be linked to port 81, as local port 80 is already used.
You can fine tune this, by linking local files to some configuration files in the docker. I did not at this point.

2 minutes, and boom, it's up and running.

To "manage" the container, just perform these actions:

docker start graphite

docker restart graphite
docker stop graphite
docker exec -it graphite bash # gives access to the container as if you were in it (bash session)
docker inspect graphite | grep Source -A 1 # gives you local addresses of some containers files

Shinken Configuration (text + tech)

find it easily in Shinken Read The Doc website.

You just add graphite modules to the /etc/shinken/broker-master.cfg (so it will use config file /etc/shinken/modules/graphite.cfg, and send data to Carbon)

/etc/shinken/brokers/broker-master.cfg contains:
...
modules webui,graphite,livestatus,Syslog
...
/etc/shinken/modules/graphite.cfg =
define module {
module_name graphite
module_type graphite_perfdata
host localhost
port 2003 ; Or 2004 if using use_pickle 1
}

That's it.

You also say you want graphite-ui module in webui.cfg, in case you use standard shinken interface (I don't, I use Thruk).

So for graph links, I use an action_url in my /etc/shinken/templates/generic-host.cfg and in my /etc/shinken/templates/generic-service.cfg (displays a link from Thruk to access graphite data, but does not include the graph actually)

grep action /etc/shinken/templates/generic-host.cf
action_url http://my.server.name:81/render?from=-36hours&until=now&width=800&height=450&target=$HOSTNAME$.rta&lineMode=connected&lineWidth=2&tz=Europe/Paris

grep action /etc/shinken/templates/generic-service.cfg
action_url http://my.server.name:81/render?from=-36hours&until=now&width=800&height=450&target=$HOSTNAME$.$SERVICEDESC$.*&lineMode=connected&lineWidth=2&title=$HOSTNAME$.$SERVICEDESC$&tz=Europe/Paris

Graphite Configuration (text + tech)

I'm configuring only the whisper database datafiles part, to fit what we need in our Shinken configuration.
So, we perform a check every 15 minutes for standard servers, every 5 minutes for production ones.
We use the linux-ssh shinken pack + some other commands to check specific ports or services.
When trying to configure the whisper datafiles , I ended up saturating my disk space when datafiles where created ( a whisper datafile has a fixed size, set when created), so I tuned settings to have correct size.

Retention:

how much we keep data, and to which precision.

My config to fit my shinken:

root@xxxxxxxxxxxxx :/opt/graphite/conf# grep -v ^$ storage-schemas.conf | grep -v ^#

[default_cpu]

pattern = .*\.cpu.*

retentions = 5m:14d,30m:84d

# archive 0 has 12 * 24 * 14 = 4032 points

# archive 1 has 2 * 24 * 84 = 4032 points

# total 8064 96KB

[default_stats]

pattern = .*\..*State*s\..*

retentions = 5m:14d,30m:224d

# archive 0 has 12 * 24 * 14 = 4032 points

# archive 1 has 2 * 24 * 224 = 10752 points

# total 14784 176KB

[default_reboot]

pattern = .*\.Reboot\..*

retentions = 15m:14d,90m:224d,360m:2240d

# archive 0 has 4 * 24 * 14 = 1344 points

# archive 1 has 16 * 224 = 3584 points

# archive 2 has 4 * 2240 = 8960 points

# total 12544 164KB

[default]

pattern = .*

retentions = 5m:14d,30m:224d,90m:896d,360m:2240d

Aggregation:

When data is old, how do we 'compress / keep' data with a lower resolution, to save space.

xFilesFactor : will tell the daemon the minimum amount of data in % (value from 0 to 1) to have. If we have less than this value, then the lower resolution (next archive) will data will be null too.

aggregationMethod : how to calculate several non null points to the next lower resolution.
average is a good choice for me, but we can choose to keep the maximum, minimum value, or other fun possibilities (see graphite doc for that)
You can use a different aggregation method per metric (again with pattern matching on regex)

My config to fit my shinken:

[default_average]

pattern = .*

xFilesFactor = 0.0

aggregationMethod = average

Note: as whipser datafiles are created with fixed size when metric is inserted, I deleted ALL metrics after having correct configuration above (I could have used whipser-resize.py, but too difficult with many different sizes of database, and no much data to save). In case you need it, you loop like this:

for WSP in $(find /opt/graphite/storage/whisper/ -name *.wsp -type f); do

> whisper-resize.py --xFilesFactor 0.0 --aggregationMethod=average $WSP \

5m:14d 30m:224d 60m:2240d > /dev/null ; done

Troubleshoot:

Using Whiper-info.py:

root@xxxxxxxxxxxxx:/# whisper-dump.py /opt/graphite/storage/whisper/XXXXX/Disks/__data_used_.wsp | head -50

Meta data:

aggregation method: average

max retention: 193536000

xFilesFactor: 0

Archive 0 info:

offset: 64

seconds per point: 300

points: 4032

retention: 1209600

size: 48384

Archive 1 info:

offset: 48448

seconds per point: 1800

points: 10752

retention: 19353600

size: 129024

Archive 2 info:

offset: 177472

seconds per point: 5400

points: 14336

retention: 77414400

size: 172032

Archive 3 info:

offset: 349504

seconds per point: 21600

points: 8960

retention: 193536000

size: 107520

Archive 0 data:

0: 1501174200, 194.87899999999999067767930682748556

1: 0, 0

2: 0, 0

3: 1501175100, 194.87899999999999067767930682748556

4: 0, 0

5: 0, 0

6: 1501176000, 194.87899999999999067767930682748556

7: 0, 0

8: 0, 0

9: 1501176900, 194.87899999999999067767930682748556

10: 0, 0

11: 0, 0

12: 1501177800, 194.87899999999999067767930682748556

13: 0, 0

14: 0, 0

15: 1501178700, 194.87899999999999067767930682748556

we can see we fill only 1/3 of the slots, because this metric is recorded for a non production server, so every 15 minutes, not every 5.

If you need to check what is really put in your whisper files, just use these python scripts (that are available directly in your container, so after the docker exec -ti graphite bash command )

whisper-info.py XXXXX/Disks/___used_.wsp

whisper-dump.py XXXXX/Disks/___used_.wsp > tmp.tmp

less tmp.tmp

Links:

http://shinken.readthedocs.io/en/latest/index.html
http://graphite.readthedocs.io/en/latest/index.html
https://github.com/hopsoft/docker-graphite-statsd

Alexandre IT Admin

Monday, August 14, 2017

Graphite - a 2017 post to simply use with Shinken monitoring (english version)