Prometheus
From charlesreid1
Prometheus is a time series database backend. It connects to data sources like Netdata and graph libraries like Grafana, making Prometheus a solid choice for a time series database.
Mostly followed this guide from Netdata wiki on how to set up Netdata with Prometheus: [1]
Also very useful: https://www.digitalocean.com/community/tutorials/how-to-install-prometheus-on-ubuntu-16-04
What Is It For
Prometheus is designed for reliability, to be the system you go to during an outage to allow you to quickly diagnose problems. Each Prometheus server is standalone, not depending on network storage or other remote services. You can rely on it when other parts of your infrastructure are broken, and you do not need to setup extensive infrastructure to use it.- https://prometheus.io/docs/introduction/overview/
Installing
Quick and Dirty
The following is the quick and dirty installation procedure. This will download the pre-built Linux binary and put it in /opt/prometheus:
$ curl -L 'https://github.com/prometheus/prometheus/releases/download/v1.7.1/prometheus-1.7.1.linux-amd64.tar.gz' -o /tmp/prometheus.tar.gz $ mkdir /opt/prometheus $ tar -xf /tmp/prometheus.tar.gz -C /opt/prometheus/ --strip-components 1
Thorough and Secure
For a more thorough and secure installation of Prometheus, we'll want to create dedicated service user accounts and set the ownership of all Prometheus files accordingly.
Link: https://www.digitalocean.com/community/tutorials/how-to-install-prometheus-on-ubuntu-16-04
End Result
Command line where Prometheus is running:
Web console:
Securing
Using
To run, just point your browser to localhost:9090
(or, if you are on another machine, access port 9090 of the machine running Prometheus.)
Word of warning, the Prometheus web portal is insecure by default.
Web Dashboard
Navigate to localhost:9090 to see the web dashboard. You can run queries and plot the results. Example:
Scraping
The configuration file for Prometheus can be configured with several sections specifying services to scrape.
Config file is in /opt/prometheus/prometheus.yml
Here is an example with three machines (one local) running Netdata and being scraped for Netdata metrics (also see Netdata/Prometheus):
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" scrape_configs: - job_name: 'netdata_jupiter' params: format: - prometheus scrape_interval: 15s scrape_timeout: 10s metrics_path: /api/v1/allmetrics scheme: http static_configs: - targets: - localhost:19999 - job_name: 'netdata_basilisk' params: format: - prometheus scrape_interval: 15s scrape_timeout: 10s metrics_path: /api/v1/allmetrics scheme: http static_configs: - targets: - 192.168.25.137:19999 - job_name: 'netdata_morpheus' params: format: - prometheus scrape_interval: 15s scrape_timeout: 10s metrics_path: /api/v1/allmetrics scheme: http static_configs: - targets: - 192.168.25.242:19999
Alerting
Promgen is a utility for providing a web interface for generating Prometheus configuration files for complicated alert scenarios.
Link: https://github.com/line/promgen
Pushing for Short-Lived Jobs
Sometimes the scraping model does not work, because a job will be ephemeral or short-lived. In this case, you can have the job push metrics to a pushgate, which Prometheus then uses to collect data.
Link: https://github.com/prometheus/pushgateway
The push gateway can also be used from the command line: https://github.com/prometheus/pushgateway#use-it
Simple Example
Push a single sample into a group identified by a job name "my_job" (assuming the Prometheus server is at localhost):
$ echo "metric_1 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/my_job
This metric will be of type "untyped" since no type information was provided.
Complicated Example
Push multiple metrics to a push gateway at pushgateway.example.org, a job called some_job, and an instance called some_instance:
(Note the TYPE statements here define the data types)
cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance # TYPE some_metric counter some_metric{label="val1"} 42 # TYPE another_metric gauge # HELP another_metric Just an example. another_metric 2398.283 EOF
Removing Metrics from Push Gateway
To delete metrics at the push gateway (here we assume localhost) grouped by job and instance:
curl -X DELETE http://localhost:9091/metrics/job/some_job/instance/some_instance
Delete all metrics grouped by job only:
curl -X DELETE http://localhost:9091/metrics/job/some_job
Using Other Languages
The curl examples above are written for the command line, but you can send similar data requests via any decent programming language.
Queries
To run a query, start typing the name of a time series and you will see an autocomplete drop-down menu from Prometheus.
You can type the name of a time series and press enter (and click the Graph button) to see a graph of your time series quantity.
However, as you collect and group more metrics together, you can also filter and match different time series.
Link: Querying Basics: https://prometheus.io/docs/prometheus/latest/querying/basics/
Dimensions
As an example of dimensions: we are collecting data with Netdata and sending it to Prometheus. Netdata collects several pieces of information about the CPU utilization under the netdata_cpu_cpu_percentage_average
variable. Prometheus collects these as a single time series with multiple dimensions. So we can specify the "dimension" keyword to filter on dimensions. The dimensions are things like "user", "system", "idle", etc.
To graph all dimensions of this time series, type this in the query box:
netdata_cpu_cpu_percentage_average
To graph only the system CPU utilization, specify the "system" dimension:
netdata_cpu_cpu_percentage_average{dimension="system"}
Jobs
If we end up monitoring several services on a single machine, or if we monitor multiple machines across a network, we will have different sections of the Prometheus configuration file, corresponding to different jobs. For example, I might have three netdata instances running on three computers: jupiter, basilisk, and morpheus.
When we query data in Prometheus, we can specify which job's data to use:
netdata_cpu_cpu_percentage_average{dimension="system",job="netdata_jupiter"}
Alternatively, we can match multiple jobs by using regular expressions and the =~
operator:
netdata_cpu_cpu_percentage_average{dimension="system",job=~"netdata_.*"}
and the output:
Best Practices
The Prometheus documentation has links to several best practice recommendations.
Building Dashboards
Here is one about building dashboards: https://prometheus.io/docs/practices/consoles/
- Have no more than 5 graphs on a console.
- Have no more than 5 plots (lines) on each graph. You can get away with more if it is a stacked/area graph.
- When using the provided console template examples, avoid more than 20-30 entries in the right-hand-side table.
- It is difficult for a set of consoles to serve more than one master. What you want to know when oncall (what is broken?) tends to be very different from what you want when developing features (how many people hit corner case X?). In such cases, two separate sets of consoles can be useful.
Grafana
http://docs.grafana.org/installation/debian/
https://blog.hda.me/2017/01/09/using-netdata-with-influxdb-backend.html
Flags
Prometheus Prometheus is a time series database tool. It has a scraping model, where Prometheus queries services for statistics, rather than waiting to receive data. It also connects to backends like Grafana.
Using Netdata with Prometheus: Prometheus/Netdata Using Grafana with Prometheus: Prometheus/Grafana Security Concerns: Prometheus/Security
|
Dashboards and Monitoring tools for creating dashboards and monitoring applications
MongoDB: MongoDB · Category:MongoDB Graphite: Graphite · Category:Graphite Prometheus: Prometheus · Category:Prometheus
Netdata: Netdata · Netdata/Prometheus · Netdata/Security · Category:Netdata Collectd: Collectd · Collectd/Mongo · Category:Collectd
Standalone: Grafana · Carbon/Graphite Javascript: D3 Python: Bokeh
|