Easy Graphite Install & Configuration
simple 2017 method (with usage of docker) <=> connect to Shinken + linux-ssh pack
We assume you have Shinken up ad running. And that now you want to use Graphite.
The method can help you have Graphite up and running with another backend too actually (just skip the Shinken/Nagios part)
Introduction (where the story is)
Using shinken for quite a few years now, I'm happy with it, with the interface (Shinken Thruk advantage compared to Nagios Thruk: we can select many services at once, and perform Ack / Recheck, .... very handful + have fail over, load balancing with no hassle + many small things)By default, Shinken has RRD graphs activated. The problem comes when, new metric comes from a server, and the RRD files become all blank, because they need to be migrated to a format with the new metric, or with the new metric format.... result is, you loose all your previous graphs with all the history. I think it happened to everyone using RRD graphs at one point of a configuration change.
Solution to this may exist, but I haven't found something simple enough to spent reasonable amount of time doing it. I chose to use the "RRD next gen", called Graphite.
What people call Graphite is actually more a set of 3 things I'll describe shortly
-CARBON , a daemon who listens on port 2003, and then feeds the database
-WHISPER, a database composed of *.wsp files, 1 for each metric
-GRAPHITE, a web interface to show graphs stored in previous database. It's Python, Cairo, Django ... driven
For installing it, I found many websites, documentations and tutorials, all a bit different, and I was ending constantly with problems likes:
- having a correct vHost for apache (+ not breaking installed sites + their modules)
- having correct Cairo version installed bia PIP
- Python version mismatch, expected '2.7.2+'
- mod_wsgi.so: Cannot load mod_wsgi.so into server: libpython2.5.so.1.0: cannot open shared object file: No such file or directory
- Target WSGI script cannot be loaded as Python module
- graphite/carbon ImportError: No module named fields
- Django: IntegrityError: column user_id is not unique
- mod_wsgi fails when it is asked to read a file
- ...
Note: I was using Debian 6 when I first tried, then Debian 7, with still some problems.
So, ok, I'm no python expert, nor basic user, but I found the Debian packaging + PIP versioning + dependencies problem just a bit too much for me.
I chose to use a ready to use Carbon-Whisper-Graphite docker container. Simple and working well in seconds (ok, minutes for the first time you use Docker)
Installation (where the tech is)
on Debian 8.8, this works super well:
curl -fsSL
https://download.docker.com/linux/debian/gpg | sudo apt-key add -
apt-key fingerprint 0EBFCD88
apt-get install apt-transport-https
ca-certificates curl python-software-properties
add-apt-repository "deb [arch=amd64]
https://download.docker.com/linux/debian \
$(lsb_release -cs) stable"
apt-get update && apt-get install
docker-ce
Now, you want to download + install the graphite docker container, in just 1 command:
docker run -d\
> --name graphite\
> --restart=always\
> -p 81:80\
> -p 2003-2004:2003-2004\
> -p 2023-2024:2023-2024\
> -p 8125:8125/udp\
> -p 8126:8126\
> hopsoft/graphite-statsd
Unable to find image 'hopsoft/graphite-statsd:latest'
locally
latest: Pulling from
hopsoft/graphite-statsd
Only thing I did, is asked port 80 of the container to be linked to port 81, as local port 80 is already used.
You can fine tune this, by linking local files to some configuration files in the docker. I did not at this point.
2 minutes, and boom, it's up and running.
To "manage" the container, just perform these actions:
docker start graphite
docker restart graphite
docker stop graphite
docker exec -it graphite bash # gives access to the container as if you were in it (bash session)
docker inspect graphite | grep Source -A 1 # gives you local addresses of some containers files
docker stop graphite
docker exec -it graphite bash # gives access to the container as if you were in it (bash session)
docker inspect graphite | grep Source -A 1 # gives you local addresses of some containers files
Shinken Configuration (text + tech)
find it easily in Shinken Read The Doc website.
You just add graphite modules to the /etc/shinken/broker-master.cfg (so it will use config file /etc/shinken/modules/graphite.cfg, and send data to Carbon)
/etc/shinken/brokers/broker-master.cfg contains:
...
modules webui,graphite,livestatus,Syslog
...
/etc/shinken/modules/graphite.cfg =
define module {
module_name graphite
module_type graphite_perfdata
host localhost
port 2003 ; Or 2004 if using use_pickle 1
}
That's it.
You also say you want graphite-ui module in webui.cfg, in case you use standard shinken interface (I don't, I use Thruk).
So for graph links, I use an action_url in my /etc/shinken/templates/generic-host.cfg and in my /etc/shinken/templates/generic-service.cfg (displays a link from Thruk to access graphite data, but does not include the graph actually)
grep action /etc/shinken/templates/generic-host.cf
action_url http://my.server.name:81/render?from=-36hours&until=now&width=800&height=450&target=$HOSTNAME$.rta&lineMode=connected&lineWidth=2&tz=Europe/Paris
grep action /etc/shinken/templates/generic-service.cfg
action_url http://my.server.name:81/render?from=-36hours&until=now&width=800&height=450&target=$HOSTNAME$.$SERVICEDESC$.*&lineMode=connected&lineWidth=2&title=$HOSTNAME$.$SERVICEDESC$&tz=Europe/Paris
Graphite Configuration (text + tech)
I'm configuring only the whisper database datafiles part, to fit what we need in our Shinken configuration.So, we perform a check every 15 minutes for standard servers, every 5 minutes for production ones.
We use the linux-ssh shinken pack + some other commands to check specific ports or services.
When trying to configure the whisper datafiles , I ended up saturating my disk space when datafiles where created ( a whisper datafile has a fixed size, set when created), so I tuned settings to have correct size.
Retention:
how much we keep data, and to which precision.My config to fit my shinken:
root@xxxxxxxxxxxxx :/opt/graphite/conf#
grep -v ^$ storage-schemas.conf | grep -v ^#
[default_cpu]
pattern = .*\.cpu.*
retentions = 5m:14d,30m:84d
# archive 0 has 12 * 24 * 14 = 4032 points
# archive 1 has
2 * 24 * 84 = 4032 points
#
total 8064 96KB
[default_stats]
pattern = .*\..*State*s\..*
retentions = 5m:14d,30m:224d
# archive 0 has 12 * 24 * 14 = 4032 points
# archive 1 has
2 * 24 * 224 = 10752 points
#
total 14784 176KB
[default_reboot]
pattern = .*\.Reboot\..*
retentions = 15m:14d,90m:224d,360m:2240d
# archive 0 has 4 * 24 * 14 = 1344 points
# archive 1 has 16 * 224 = 3584 points
# archive 2 has 4 * 2240 = 8960 points
#
total 12544 164KB
[default]
pattern = .*
retentions = 5m:14d,30m:224d,90m:896d,360m:2240d
Aggregation:
When data is old, how do we 'compress / keep' data with a lower resolution, to save space.
aggregationMethod : how to calculate several non null points to the next lower resolution.
average is a good choice for me, but we can choose to keep the maximum, minimum value, or other fun possibilities (see graphite doc for that)
You can use a different aggregation method per metric (again with pattern matching on regex)
My config to fit my shinken:
[default_average]
pattern = .*
xFilesFactor = 0.0
aggregationMethod = average
Note: as whipser datafiles are created with fixed size when metric is inserted, I deleted ALL metrics after having correct configuration above (I could have used whipser-resize.py, but too difficult with many different sizes of database, and no much data to save). In case you need it, you loop like this:
for WSP in $(find
/opt/graphite/storage/whisper/ -name *.wsp -type f); do
> whisper-resize.py
--xFilesFactor 0.0
--aggregationMethod=average $WSP
\
5m:14d 30m:224d 60m:2240d >
/dev/null ; done
Troubleshoot:
Using Whiper-info.py:
we can see we fill only 1/3 of the slots, because this metric is recorded for a non production server, so every 15 minutes, not every 5.
root@xxxxxxxxxxxxx:/# whisper-dump.py /opt/graphite/storage/whisper/XXXXX/Disks/__data_used_.wsp | head -50
Meta data:
aggregation method: average
max retention: 193536000
xFilesFactor: 0
Archive 0 info:
offset: 64
seconds per point: 300
points: 4032
retention: 1209600
size: 48384
Archive 1 info:
offset: 48448
seconds per point: 1800
points: 10752
retention: 19353600
size: 129024
Archive 2 info:
offset: 177472
seconds per point: 5400
points: 14336
retention: 77414400
size: 172032
Archive 3 info:
offset: 349504
seconds per point: 21600
points: 8960
retention: 193536000
size: 107520
Archive 0 data:
0: 1501174200, 194.87899999999999067767930682748556
1: 0, 0
2: 0, 0
3: 1501175100, 194.87899999999999067767930682748556
4: 0, 0
5: 0, 0
6: 1501176000, 194.87899999999999067767930682748556
7: 0, 0
8: 0, 0
9: 1501176900, 194.87899999999999067767930682748556
10: 0, 0
11: 0, 0
12: 1501177800, 194.87899999999999067767930682748556
13: 0, 0
14: 0, 0
15: 1501178700, 194.87899999999999067767930682748556
we can see we fill only 1/3 of the slots, because this metric is recorded for a non production server, so every 15 minutes, not every 5.
If you need to check what is really put in your whisper files, just use these python scripts (that are available directly in your container, so after the docker exec -ti graphite bash command )
whisper-info.py
XXXXX/Disks/___used_.wsp
whisper-dump.py
XXXXX/Disks/___used_.wsp > tmp.tmp
less tmp.tmp
Links:
http://shinken.readthedocs.io/en/latest/index.htmlhttp://graphite.readthedocs.io/en/latest/index.html
https://github.com/hopsoft/docker-graphite-statsd