Next, we’ll take a look at the alerts screen.

open "http://$PROM_ADDR/alerts"

The screen is empty. Do not despair. We’ll get back to that screen quite a few times. The alerts will be increasing as we progress. For now, just remember that’s where you can find your alerts.

Finally, we’ll open the graph screen.

open "http://$PROM_ADDR/graph"

That is where you’ll spend your time debugging issues you’ll discover through alerts.

Retrieve node information using kube_node_info #

As our first task, we’ll try to retrieve information about our nodes. We’ll use kube_node_info so let’s take a look at its description (help) and its type.

kubectl -n metrics run -it test \
    --image=appropriate/curl \
    --restart=Never \
    --rm \
    -- prometheus-kube-state-metrics:8080/metrics \
    | grep "kube_node_info"

The output, limited to the HELP and TYPE entries, is as follows.

# HELP kube_node_info Information about a cluster node.
# TYPE kube_node_info gauge
...

🔍 You are likely to see variations between your results and mine. That’s normal since our clusters probably have different amounts of resources, my bandwidth might be different, and so on. In some cases, my alerts will fire, and yours won’t, or the other way around. I’ll do my best to explain my experience and provide screenshots that accompany them. You’ll have to compare that with what you see on your screen.

Now, let’s try using that metric in Prometheus.

Please type the following query in the expression field.

kube_node_info

Click the Execute button to retrieve the values of the kube_node_info metric.

🔍 Unlike previous chapters, the Gist from this one 03-monitor.sh contains not only the commands but also Prometheus expressions. They are all commented (with #). If you’re planning to copy & paste the expressions from the Gist, please exclude the comments. Each expression has a # Prometheus expression comment on top to help you identify it. As an example, the one you just executed is written in the Gist as follows. #Prometheus expression #kube_node_info

If you check the HELP entry of the kube_node_info, you’ll see that it provides information about a cluster node and that it is a gauge. “A gauge is a metric that represents a single numerical value that can arbitrarily go up and down”. That makes sense for information about nodes since their number can increase or decrease over time.

Prometheus Gauge metric #

📌 A Prometheus gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

If we focus on the output, you’ll notice that there are as many entries as there are worker nodes in the cluster. The value (1) is useless in this context. Labels, on the other hand, can provide some useful information. For example, in my case, the operating system (os_image) is Ubuntu 16.04.5 LTS. Through that example, we can see that we can use the metrics not only to calculate values (e.g., available memory) but also to get a glimpse into the specifics of our system.

Get hands-on with 1200+ tech skills courses.