Metrics are numerical representations of statuses (e. g. number of open connections) or throughputs (e. g. writing volume on a hard drive since a specific point in time, calling of a specific functionality). This way they differ from logs and tracing data relating to individual events.
Metrics can be queried of standard applications (e. g. NGINX, DBs or objects in Kubernetes) via so-called exporters or metric endpoints. Customer-specific applications should be instrumented in a way that allows to measure SLAs as well as to obtain further information on the usage (e. g. number and response time of critical calls) for detailed performance observations.
The currently widely used metrics format was introduced by Prometheus and standardized via OpenMetrics. Metric points here are composed as follows:
- Metrics name: describes what is represented, e. g. server_open_connection_count
- Labels: on the basis of labels, differently measured instances can be distinguished, e. g. instance=127.0.0.1:8080.
- Timestamp: at what time was this value valid?
- Value: numeric value
This way, the performance and where appropriate also the number of errors or specific statuses can be compactly represented and graphically visualized in Grafana.
Based on these numeric values, rules can be defined that provide a message if the system has exceeded limit values – e. g. if more than 90 % of the available connections were occupied for more than 10 minutes or if during 5 minutes an average of more than 2 % of the queries resulted in errors. A monitoring tool for metrics like Prometheus stores the metrics, checks these conditions, and can subsequently notify the ones in charge.