Promscale: An analytical platform and long-term store for Prometheus, with the combined power of SQL and PromQL

In this post we introduce Promscale, a new open-source long-term store for Prometheus data designed for analytics.

Promscale is a horizontally scalable and operationally mature platform for Prometheus data that offers the combined power of PromQL and SQL, enabling developers to ask any question, create any dashboard, and achieve greater visibility into their systems. Promscale is built on top of TimescaleDB, the leading relational database for time-series.

Promscale is the result of a year of dedicated development effort by one of Timescale's engineering teams. It incorporates feedback from users and the general Prometheus community, and builds on 3.5 years of feedback from users of our previous Prometheus read-write adapter (for more, please see this related design doc). As a result, despite being a young project, Promscale already sports an active user community, including organizations like Electronic Arts, Dow Chemical, and many others. This latest release marks the graduation of Promscale out of beta.

(The name “Promscale” itself was picked by our users and the Prometheus community via this GitHub poll. Although some of us were secretly rooting for “Promy McPromFace😂.)

To get started right away, visit our GitHub repo to install Promscale via Helm Charts, Docker, and others. And, if you like what we're building, please give us a ⭐️ on GitHub 🤗.

If you have a Kubernetes cluster with Helm installed, we suggest using tobs to install a full metric collection and visualization solution including Prometheus, Grafana, Promscale, and a preview version of PromLens in under 5 minutes (demo video).

Note: Although Mat, Josh, and Harkishen are listed as the authors of this post, full credit goes to the entire Promscale team: Ante Krešić, Blagoj Atanasovski, David Kohn, Harkishen Singh, Josh Lockerman, and Mat Arye.

But why did we build Promscale? Please read on for more.

We are witnessing a shift in the role of software, and in the ways organizations manage and monitor their software

Today, every industry is moving its computing to the cloud. The complexity and scale of these modern, cloud-based applications necessitate sophisticated systems to monitor software application health and manage software infrastructure. Unlike in the past, when systems were all built using proprietary software, this new wave of modern infrastructure is being built using free, open components, like Kubernetes and Prometheus. The top two reasons for this shift are: flexibility and cost. Unlike proprietary SaaS solutions, open tools put the users' needs first, enabling them to customize their stack to meet their needs, and cost pennies on the dollar. In this world, developers, not sales contracts, nor RFPs, nor enterprise sales teams, decide which tools are used.

Prometheus has emerged as the de facto monitoring solution for modern software systems

Prometheus, is an open-source systems monitoring and alerting toolkit that can be used to easily and cost-effectively monitor infrastructure and applications. Over the past few years, Prometheus has emerged as the monitoring solution for modern software systems. The key to Prometheus’ success is its pull-based architecture in combination with service discovery, which is able to seamlessly monitor modern, dynamic systems in which (micro-)services startup and shutdown frequently.

Problem: Prometheus is not designed for analytics

As organizations use Prometheus to collect data from more and more of their infrastructure, the benefits from mining this data also increase. Analytics becomes critical for auditing, reporting, capacity planning, prediction, root-cause analysis, and more. Prometheus's architectural philosophy is one of simplicity and extensibility. Accordingly, it does not itself provide durable, highly-available long-term storage or advanced analytics, but relies on other projects to implement this functionality.

There are existing ways to durably store Prometheus data, but while these options are useful for long-term storage, they only support the Prometheus data model and query model (limited to the PromQL query language). While these work extremely well for the simple, fast analyses found in dashboarding, alerting, and monitoring, they fall short for more sophisticated analysis capabilities, or for the ability to enrich their dataset with other sources needed for insight-generating cross-cutting analysis.

Solution: Promscale scales and augments Prometheus for long-term storage and analytics

Enter Promscale. We built Promscale to conquer a challenge that we, and other developers, know all too well: how do we easily find answers to complex questions in our monitoring data?

Built on top of TimescaleDB and PostgreSQL, Promscale supports both PromQL and SQL, offers horizontal scalability to over 10 million metrics per second and petabytes of storage, supports native compression, handles high-cardinality, provides rock-solid reliability, and more. It also offers other native time-series capabilities, such as data retention policies, continuous aggregate views, downsampling, data gap-filling, and interpolation. It is already natively supported by Grafana via the Prometheus and PostgreSQL/TimescaleDB data sources.

Promscale architecture and how it fits into the observability stack

Prometheus writes data to the Promscale connector using the remote_write API, storing the data in TimescaleDB. The Promscale connector understands PromQL queries natively and fetches data from TimescaleDB to execute them, while SQL queries go to TimescaleDB directly.

Promscale is open-source, licensed under Apache 2. TimescaleDB is licensed under the completely free, source-available Timescale License.

Promscale stores data in a dynamically auto-generated schema highly optimized for Prometheus metrics that is the result of thorough benchmarking and community discussion (as can be seen in this design doc). In particular, this schema decouples individual metrics, allowing for the collection of metrics with vastly different cardinalities and retention periods. At the same time, Promscale exposes simple, user-friendly views so that developers do not have to understand this optimized schema.

Thanks to its relational foundation, Promscale also supports a variety of data types (numerics, text, arrays, JSON, booleans), JOINS, and ACID semantics, in addition to simple metric data. Because Promscale is built on top of PostgreSQL, it is operationally mature and includes capabilities such as high-availability, streaming backups, upgrades over time, roles and permissions, and security.

Promscale also benefits from the TimescaleDB user community: tens of millions of downloads, over half a million active databases, 5,000+ member Slack channel.

User testimonials

Although a relatively new project, Promscale is already in use by developers across the globe:

"We have game metrics available in different data sources like Graphite, Datadog, and Cloudwatch. We are storing all of these metrics in Prometheus, with Promscale for long-term storage. Promscale lets us collate metrics from these different sources and generate a single report in a unified view so that we can have better visibility into what is happening inside our games."
— Saket K., Software Engineer, Electronic Arts
"Our goal is to have all of our sites from around the world monitored using Prometheus and view the resulting data in a user-friendly way. We chose Promscale to store our data because it scales, offers flexibility – for example, dividing read and write activities among different nodes – and has the operational maturity and rock-solid reliability of PostgreSQL, including streaming backups and high-availability."
Adam B., Service Specialist, Dow Chemical

Install Promscale today via Helm Charts, Docker, and others. More information on GitHub. (And, if you like what we are building, please give us a ⭐️ on GitHub 🤗.)

If you have a Kubernetes cluster with Helm installed, we suggest using tobs to install a full metric collection and visualization solution including Prometheus, Grafana, Promscale, and a preview version of PromLens within 5 minutes (video).

How to get involved with the Promscale community:

To learn more about the origin, status, and roadmap for this project, please read on.

Prometheus has emerged as the monitoring solution for modern software systems

Over the past few years, Prometheus, an open-source systems monitoring and alerting toolkit that can be used to easily and cost-effectively monitor infrastructure has emerged as the monitoring solution for modern software systems.

Source: Prometheus docs

The key to Prometheus’ success is that it is built for modern, dynamic systems in which services start up and shut down frequently. The simple way that Prometheus collects data works extremely well with the ephemeral, churning nature of modern software architectures, and microservices in particular, because the services themselves don’t need to know anything about the monitoring system. Any service that wants to be monitored simply exposes its metrics over an HTTP endpoint. Prometheus scrapes these endpoints periodically and records the values it sees into a local time-series database.

Prometheus’ decoupled architecture makes the system as a whole much more resilient. Services don’t need the monitoring stack to be up to get work done, and the monitoring software only needs to know about individual services while it’s actually scraping them. This makes it easy for the monitoring system to adjust seamlessly as services fail and new ones are brought up.

This architecture also responds gracefully to overloading. While push-based architectures often drown in traffic when under high load. Prometheus simply slows down its scrape loop. Thus, while your metric resolution may suffer, your monitoring system will remain up and functional.

Keeping with the theme of resilience and simplicity, Prometheus doesn’t try to store data for the long term, but rather exposes an interface allowing a dedicated database to do so instead. Prometheus continually pushes data to this  remote-write interface, ensuring that metric data is durably stored. That is where external long-term storage systems come in.

Analytical options for Prometheus data are lacking

As developers use Prometheus to collect data from more and more of their infrastructure, the benefits from mining this data also increase. Analytics becomes critical, for things like auditing, reporting, capacity planning, prediction, root-cause analysis, and more.

Prometheus itself was developed with a clear sense of what it is, and is not, designed to do. Prometheus is designed to be a monitoring and alerting system; but it is not a durable, highly-available long-term store of data, nor a store for other datasets, nor a sophisticated analytics engine. However, though these capabilities are not provided by Prometheus itself, they are critical for the longer-duration and more intensive usages of metric data, including auditing, reporting, capacity planning, predictive analytics, root-cause analysis, and many others. As such, Prometheus provides hooks to forward its data to an external data store more suited for these tasks.

Existing options for storing Prometheus data externally, while useful, all focus on long-term storage, and in some cases, limited forms of aggregation. Such systems can only store floats, and perform PromQL queries, making them too limited, both in data-stored and in query-model, to perform sophisticated analytics.

In addition, as great as the Prometheus architecture is for recording data in highly dynamic environments, its method of collecting data at unaligned intervals creates challenges when analyzing data, since timestamps from multiple “simultaneous” scapes on different endpoints can differ by a significant amount.

Prometheus devised a language called PromQL that addresses these difficulties by regularizing data at query time: aligning the data at user-specified intervals and discarding excess data points. While this method of analysis works extremely well for simple, fast analyses, found in dashboarding, alerting, and monitoring, it can be lacking for more-sophisticated analysis.

For example, PromQL can’t aggregate across both series and time, making it quite difficult (if not impossible) to get accurate statistics over time for a particular label key, which is necessary for things such as determining when a memory leak was introduced by looking at 90th percentile memory usage grouped by app version across a long time-span. This kind of drill-down and reaggregation is important for many kinds of analytics, because even when the data contains the information needed for the problem at hand, it often wasn’t gathered with that kind of analysis in mind. Other PromQL features, such as joins, filters, and statistics, are similarly restricted, limiting its usage in discovering trends and developing insights.

Others have also written about these issues: The CNCF SIG-Observability working group has put together a list of use-cases in the observability space that need better tools for metrics analytics. Dan Luu, a popular tech blogger, also had a widely distributed blog post about getting more value out of your metric data.

This is where Promscale comes in.

Why we built Promscale

We say the market lacks a system for deep analytics of Prometheus data because we’ve felt that need while monitoring our own infrastructure. We built Promscale to conquer a challenge that we, and other developers, know all too well: how do we easily find answers to complex questions in our monitoring data?

We are big fans of Prometheus as software developers and operators – in particular, we became involved in the Prometheus ecosystem 3.5 years ago when we initially published our previous Prometheus adapter, one of the first read-write adapters.

But after multiple years of use and study we realized we needed capabilities beyond what Prometheus - and its associated tools - currently offer.

In our stack, this includes things like:

  • Auxiliary data about the system being monitored to augment metrics with additional information that helps us understand what they mean, such as node hardware properties, user/owner information, geographic location, or what the workload is running.
  • Joins combining metrics with this additional auxiliary data and metadata to create a complete view of the system.
  • Efficient long-term storage for historical analysis, such as reporting of past incidents, capacity planning, auditing, and more.
  • Flexible data management to handle the large volume of data monitoring generated, with tiering support such as multi-tenancy, automated data retention, and downsampling.
  • Isolation between the various metrics. Since different metrics can be sent by completely different systems, we want both the performance and data management of different metrics to be independent (e.g., so that downsampling one metric won’t affect others).
  • Logs and traces alongside metrics, to provide a better all-around view of the system. If all three modalities are in the same database, then JOINs between this data can lead to interesting insight. (To be clear, Promscale does not support logs and traces today, but this is an area of future work.)
  • SQL as a versatile query language for those general analytics that PromQL isn’t suited for, as well as the lingua franca spoken by a variety of data analysis and machine learning tools.

What our infrastructure team really wanted was an analytical platform on top of Prometheus to achieve more-insightful and cost-effective observability into our own infrastructure.

That is what we built with Promscale.

How Promscale works

Architecture
This architecture uses the standard remote_write / remote_read Prometheus API, cleanly slotting into that space in the Prometheus stack.

Prometheus writes data to the Promscale connector using the remote_write API, storing the data in TimescaleDB. Promscale understands PromQL queries natively and fetches data from TimescaleDB to execute them, while SQL queries go to TimescaleDB directly.

Promscale architecture and how it fits into the observability stack

Promscale can be deployed in any environment running Prometheus, alongside any Prometheus instance. We provide Helm charts for easier deployments to Kubernetes environments.

SQL interface
The data stored in Promscale can be queried both in PromQL and SQL. Though the data layout we use is internally quite sophisticated (more details in this design doc), you don’t need to understand any of it to analyze metrics through our easy-to-use SQL views.

Each metric is exposed through a view named after the metric, so a measurement called cpu_usage is queried like:

SELECT 
	time, 
	value, 
	jsonb(labels) as labels 
FROM "cpu_usage";
time                    value   labels  
2020-01-01 02:03:04	0.90   	{"namespace": "prod", "pod”: "xyz"}
2020-01-01 02:03:05	0.98   	{"namespace": "dev",  "pod”: "abc"}
2020-01-01 02:03:06	0.70   	{"namespace": "prod", "pod": "xyz"}

The most important fields are time, value, and labels.

labels represents the full set of labels associated with the measurement and is represented as an array of identifiers. In the query above we view the labels in their JSON representation using the jsonb() function.

Each row has a series_id uniquely identifying the measurement’s label set. This enables efficient aggregation by series. You can easily retrieve the labels array from a series_id using the labels(series_id) function. As in this query that shows how many data points we have in each series:

SELECT
	jsonb(labels(series_id_)) as labels,
	count(*)
FROM "cpu_usage" 
GROUP BY series_id;
labels               				count
{"namespace": "prod", "pod”: "xyz"}		1
{"namespace": "dev",  "pod”: "abc"}		7
{"namespace": "prod", "pod": "xyz"}		3

Each label key (in our example namespace and pod) is expanded out into its own column storing foreign key identifiers to their value, which allows us to JOIN, aggregate and filter by label keys and values. You get back the text represented by a label id using the val(id) function. This opens up nifty possibilities such as aggregating across all series with a particular label key. For example, to determine the median CPU usage reported over the past year grouped by namespace, you could run:

SELECT 
	val(namespace_id) as namespace, 
	percentile_cont(0.5) within group (order by value) 
AS median
FROM “cpu_usage” 
WHERE time > '2019-01-01'
GROUP BY namespace_id;
namespace       median
prod            0.8
dev             0.7

The complete view looks something like this:

SELECT * FROM "cpu_usage";
time			value	labels  series_id 	namespace_id	pod_id
2020-01-01 02:03:04	0.90    {1,2} 	1		1		2
2020-01-01 02:03:05	0.98    {4,5}	2		4		5
2020-01-01 03:03:06	0.70    {1,2)	1		1		1

To simplify filtering by labels, we created operators corresponding to the selectors in PromQL. Those operators are used in a WHERE clause of the form labels ? (<label_key> <operator> <pattern>). The four operators are:

  • == matches tag values that are equal to the pattern
  • !== matches tag value that are not equal to the pattern
  • ==~ matches tag values that match the pattern regex
  • !=~ matches tag values that are not equal to the pattern regex

These four matchers correspond to each of the four selectors in PromQL, though they have slightly different spellings to avoid clashing with other PostgreSQL operators. They can be combined together using any boolean logic with any arbitrary WHERE clauses.

For example, if you want only those metrics from the production namespace namespace or those whose pod starts with the letters "ab" you simply OR the corresponding label matchers together:

SELECT avg(value) 
FROM "cpu_usage" 
WHERE labels ? ('namespace' == 'production') 
       OR labels ? ('pod' ==~ 'ab*')

Combined, these features open up all kinds of possibilities for analytics. For example, you could get easily get the 99th percentile of memory usage per container in the default namespace with:

SELECT 
  val(used.container_id) container, 
  percentile_cont(0.99) within group(order by used.value) percent_used_p99  
FROM container_memory_working_set_bytes used
WHERE labels ? ('namespace' == 'default')  
GROUP BY container 
ORDER BY percent_used_p99 ASC 
LIMIT 100;
container             		       percent_used_p99
promscale-drop-chunk                            1433600
prometheus-server-configmap-reload              6631424
kube-state-metrics                             11501568

Or, to take a more complex example from Dan Luu’s post, you can discover Kubernetes containers that are over-provisioned by finding those containers whose 99th percentile memory utilization is low:

WITH memory_allowed as (
  SELECT 
    labels(series_id) as labels, 
    value, 
    min(time) start_time, 
    max(time) as end_time 
  FROM container_spec_memory_limit_bytes total
  WHERE value != 0 and value != 'NaN'
  GROUP BY series_id, value
)
SELECT 
  val(memory_used.container_id) container, 
  percentile_cont(0.99) 
    within group(order by memory_used.value/memory_allowed.value) 
    AS percent_used_p99, 
  max(memory_allowed.value) max_memory_allowed
FROM container_memory_working_set_bytes AS memory_used 
INNER JOIN memory_allowed
      ON (memory_used.time >= memory_allowed.start_time AND 
          memory_used.time <= memory_allowed.end_time AND
          eq(memory_used.labels,memory_allowed.labels)) 
WHERE memory_used.value != 'NaN'   
GROUP BY container 
ORDER BY percent_used_p99 ASC 
LIMIT 100;
container			       percent_used_p99        total
cluster-overprovisioner-system    6.961822509765625e-05   4294967296
sealed-secrets-controller           0.00790748596191406   1073741824
dumpster                             0.0135690307617187    268435456

Demo!

In this 15 minute demo video, Avthar shows you how Promscale handles SQL and PromQL queries, via the terminal and Grafana.

Getting Started

Install Promscale today via Helm Charts, Docker, and others. More information on GitHub. (And if you like what we are building, please give us a ⭐️ on Github 🤗.)

If you have a Kubernetes cluster with Helm installed, we suggest using tobs to install a full metric collection and visualization solution including Prometheus, Grafana, Promscale, and a preview version of PromLens in under 5 minutes:

Promscale can be deployed in any environment running Prometheus, alongside any Prometheus instance. If you already have Prometheus installed and/or aren’t using Kubernetes, see our README for various installation options.

How to get involved in the Promscale community: