How to Analyze Your Prometheus Data in SQL Tips & Resources

See step-by-step demos, an example roll-your-own monitoring setup using open source software, and 3 queries you can use immediately.

We recently hosted “How to Analyze Your Prometheus Data in SQL” - a 45 min technical session focused on the value of storing Prometheus metrics for the long term and how (and why) to monitor your infrastructure with Prometheus, Grafana, and Timescale.

But, we know not everyone could make it live, so we’ve published the recording and slides for anyone and everyone to access at any time. If you’re anything like me, you’re eager for some remote learning opportunities (now more than ever), and this session shows you how to roll-your-own analytics solution. Result: more flexibility, lower costs 🙌.

What you’ll learn:

Whether you’re new to monitoring, Prometheus, and Grafana or well-versed in all that Prometheus and Grafana have to offer, you’ll see (a) what a long-term data-store is and why you should care and (b) how to create an open source, flexible monitoring system, using your own or sample data.

This session came from my own experiences and what I hear again and again from community members: “I know I should, and I want to, keep my metrics around for longer – but how do I do it without wasting disk space or slowing down my database performance?”

I use my own project to demo various best practices, but the things I show you apply to any scenario or project.

The session’s broken into 5 parts:

Why use a long-term store for monitoring metrics?

Storing long-term metrics data (or, more simply, keeping them around longer v. deleting them to make space for more recent logs, traces, and other reporting) gives you four advantages over solely examining real-time or recent data:

Manage costs: With insight into past and present metrics, you can analyze the cost of doing business in various scenarios. How much does it cost to run your site on Black Friday? How much does it cost to add new sensors to your IoT infrastructure? Just how much did that unplanned downtime or system maintenance window cost you?
Plan capacity: You can understand how much infrastructure you’ll need to support various business scenarios, given your growth rate and historical usage patterns.
Identify root causes: Having long-term views of metrics enables you to look for correlations when outages, degradation, or other periodic mishaps occur in your infrastructure.
Look at patterns and trends over longer time periods: Many companies want to see long-term resource consumption trends. LAIKA - one of our amazing Timescale customers - is a great example of why this is useful: they keep their IT Monitoring metrics around for several years, which is the timeframe of producing one of their stop-motion feature films. Then, they use this data to understand how resource allocation changes over the course of production, so they can better equip and resource themselves for their next project.

Wait, doesn’t Prometheus do that for me?

Prometheus does a lot of things well: it’s an open-source systems monitoring and alerting toolkit that many developers use to easily (and cheaply) monitor infrastructure and applications. It’s awesome because it solves monitoring in a simple and straightforward way.

However, it’s not designed to be scalable or with long-term durability in mind. That means that Prometheus data can only stick around for so long - by default, a 15 day sliding window - and is difficult to manage operationally, as there’s no replication or high-availability.

That’s a problem because keeping metrics data for the long haul - say months or years - is valuable, for all the reasons listed above :).

Roll-your-own monitoring: system overview

I use a scenario where I want to monitor a production database, but all-in-one monitoring tools are too expensive or inflexible to meet my requirements (true story!).

My monitoring system architecture

I’m a developer and love to build things, so, of course, I decided to roll-my-own monitoring system using open source software - like many of the developers I speak to on a daily basis. My setup:

Prometheus for metrics collection
Grafana for visualization
TimescaleDB for storage and analysis.

I breakdown each component in detail during the session.

System Pro Tips

You’ll also get a few best practices along the way, including TimescaleDB features to enable to make it easier to store and analyze Prometheus metrics (this has the added benefit of making your Grafana dashboards faster too).

Pro tips: set up retention policies for raw Prometheus metrics and create downsampled rollups

Let’s code: 3 Queries you can use today

I promised some coding, so let’s get to it. You’ll spend a solid 15-20 mins using 3 queries to analyze Prometheus metrics and visualize them in Grafana.

You’ll learn how to :

Create aggregates for historical analysis in order to keep your Grafana dashboards healthy and running fast
See 10 minute rollups of metrics data
Query metrics by name and id
JOIN aggregate data with relational data to create the visualizations you need
Use patterns, like querying views to save from JOIN-ing on hypertables on the fly

Snippet of me demo'ing how (and why) to use `time_bucket`

Resources + Q & A

Want to re-create the monitoring system used in the demo? Or, perhaps you want to try querying your own Prometheus metrics with Grafana and TimescaleDB? We have you covered!

In the session, we link to several resources, like tutorials and sample dashboards to get you well on your way, including:

Recreate my demo monitoring system, using Prometheus, Timescale and Grafana running in Docker containers on EC2
Follow along with the session or dive straight into visualizing your Prometheus metrics with Grafana
Get started with Timescale Cloud (our hosted time-series database, which comes with free cloud credit to help get you up and running).
Join our Slack to ask questions and get help from our engineers and community members.

Community questions

We received questions throughout the session (thank you to everyone who submitted one!), with a selection below.

Do TimescaleDB Hypertables support all PostgreSQL functions and operators, such as SUM and LIKE?

YES, everything is supported! Since TimescaleDB is a PostgreSQL extension, you can use all your favorite PostgreSQL functions that you know and 💛.

...and TimescaleDB includes built-in SQL functions optimized for time-series analysis.

Is compression available via the open source version?

Compression - one of our features that allows you to “compress” data and reduce the amount of space your data takes up - is available on our Community version, not open source. But, the community version is free to use forever!

To see the features available in each version (Timescale Cloud, Community, and open source) see this comparison (the page also includes various FAQs, links to documentation, and more).

Is the pg_prometheus extension supported by AWS RDS?

The bad news: the pg_prometheus extension is only available on actual PostgreSQL databases – and, while RDS is PostgreSQL-compatible, it doesn’t count :(.

BUT, there’s good news (!) If you’re looking for a hosted and managed database to keep your Prometheus metrics, you can use Timescale Cloud as an RDS alternative.

We’re also working on an updated PostgreSQL adapter that doesn’t require pg_prometheus extension. Interested? Keep an eye on our GitHub page and sign up for our newsletter to get notified when it’s available.

And, even more good news: one of our community members - shoutout to Sean Sube - created a modified version of the prometheus-postgresql-adapter that may work on RDS (it doesn’t require the pg_prometheus extension on the database where you’re sending your Prometheus metrics) - check it out on GitHub.

Is there any benefit of using Grafana’s `$__timeGroupAlias` instead of TimescaleDB’s `time_bucket` function?

Fun fact, the $__timeGroupAlias macro will use time_bucket under the hood if you enable Timescaledb support in Grafana for your PostgreSQL data sources, as all Grafana macros are translated to SQL,

The difference between time_bucket and the $__timeGroupAlias is that the macro will alias the result column name so Grafana will pick it up, which you have to do yourself if you use time_bucket.

How does Prometheus scrape the TimescaleDB application database instance that you’re monitoring?

I’ve set up an endpoint that exposes Prometheus metrics, which Prometheus then scrapes. From there, the PostgreSQL adapter takes those metrics from Prometheus and inserts them into TimescaleDB.

This tutorial (also included in the above “Resources + Q & A” section) shows you how to set up a Prometheus endpoint for a Timescale Cloud database, which is the example that I used.

You mention an endpoint. Is this a Prometheus component?

Yes, endpoints are part of how Prometheus functions (and, for reference, here’s more detail on how endpoints function as part of Prometheus.

In my example, there’s an HTTP endpoint - containing my Prometheus metrics - that’s exposed on my Timescale Cloud-hosted database. Prometheus scrapes that endpoint for metrics.

Parting Words

As always, thank you to those who made it live – and to those who couldn’t, I – and the rest of Team Timescale – are here to help at any time. Reach out via our public Slack channel, and we’ll happily jump in.

For easy reference, here are the recording and slides for you to check out, re-watch, and share with friends and teammates.

Want to learn more about this topic? I'm also hosting another session on Wed, April 22nd: “Guide to Grafana 101: How to Build (awesome) Visualizations for Time-Series Data.”

📊RSVP to start tackling filters, gauges, histograms, world maps, and more.

To learn about future sessions and get updates about new content, releases, and other technical content, subscribe to our Biweekly Newsletter.

Hope to see you on the next one!