Your Questions Answered (YQA): April Edition
This month brought us a wide range of community questions, from optimizing queries for Grafana to effective indexing, partitioning data, and more. See our product & engineering team's responses – and ask your own at an upcoming session.
In “Your Questions Answered,” we share - and answer - some of the most interesting, generally applicable, or frequently asked questions from the latest Office Hours sessions, virtual events, and our Slack community.
Have a question you’d like answered?
- Sign up for an upcoming Office Hours session (first Tuesday of each month @ 9am PT / 12pm ET / 4 pm GMT)
- Join our Slack channel (our engineers are active in all channels)
- Start a conversation on Twitter (@timescaleDB)
Let's get to it!
How do I optimize my queries for Grafana?
We recommend optimizing your queries, using a SQL development tool like pgAdmin or DBeaver, before you port anything to Grafana to ensure that queries perform as expected. Once you’re confident that you’re getting the results you expect and need, you can build dashboards accordingly.
Additionally, TimescaleDB includes various capabilities to optimize your queries, including: continuous data aggregation, downsampling, and advanced columnar compression. We'd also recommend checking out our step-by-step Continuous Aggregates tutorial as you get started.
And, if you’re interested in learning more about creating Grafana dashboards for your time-series datasets, join us on April 22nd for “Guide to Grafana 101: Getting started with (awesome) visualizations.” We’ll walk through the UI, then demo when, why, and how to use various types of charts, gauges, histograms, and more.
What’s the most effective indexing to use?
Our experience has shown that for time-series data, the most-useful index type varies widely depending on your data.
However, we typically encourage developers to start with our default indexes; TimescaleDB automatically creates a time index on your data when a hypertable is created. Equally important, we outline various best practices and considerations in our docs (including indexing columns with discrete values and continuous values).
A few things to consider and remember:
- TimescaleDB supports the range of PostgreSQL index types (PostgreSQL docs).
- Creating, altering, or dropping an index on your TimescaleDB hypertable will similarly be propagated to all its constituent chunks (all TimescaleDB hypertables are comprised of many interlinked "chunk" tables).
- Commands made to the hypertable automatically propagate changes down to all of the chunks belonging to that hypertable.
To index data, use the CREATE INDEX
command (e.g., CREATE INDEX ON conditions (location, time DESC);
). You can do this before or after converting the table to a hypertable.
By default, TimescaleDB automatically creates a time index on your data when a hypertable is created.
And, if the create_hypertable
command specifies an optional "space partition" in addition to time (say, the location
column), TimescaleDB automatically creates the following index:CREATE INDEX ON conditions (location, time DESC);
Again, while we encourage new users to use our defaults, the most useful index type varies depending on your data.
You can override the defaults when you execute the create_hypertable
command. Prior to doing so, you’ll want to consider things like query patterns, planning, and schemas to understand how different indexes may apply.
For indexing columns with discrete values, we suggest using an index like this (using our hypertable conditions
for the example):CREATE INDEX ON conditions (location, time DESC);
For all other types of columns, like columns with continuous values, the index should be in the form: CREATE INDEX ON conditions (time DESC, temperature);
Check out our docs for a deeper dive into TimescaleDB best practices for indexing, or reach out on Slack and we’ll happily assist.
How should we approach query optimization for a real-time analytical system? We currently have a large Postgres deployment (200G of data), and our system is heavily reliant on Postgres functions. I/O seems to be the one problem that limits the performance of the system.
When thinking about this in the context of TimescaleDB, you'll want to consider your chunk size and the amount of memory available to Postgres.
Typically, you want to be able to fit a single chunk in memory. If you’re looking for real-time data presentation, make sure you have the right chunk size to ensure that the majority of the data you want to query resides in memory (thus avoiding the I/O bottleneck).
For specific guidance, we’d want to understand a bit more about the configuration of your disk subsystem - clearly a topic too deep for this forum, but exactly what our Slack community is for! Post the details of your scenario, and we’ll work with you to find the right solution.
How should I think about partitioning data into multiple dimensions? When is it useful and how do I know which number of partitions to use (e.g., in the number_partitions
argument)?
Partitioning data into multiple dimensions is most useful when you’re running TimescaleDB in our multi-node setup*, because data is partitioned across nodes.
Partitioning data into multiple dimensions may be useful in some cases on a single-node, for instance when you're splitting a hypertable across multiple tablespaces, or when attempting to achieve significantly more data locality (not a common scenario, but can happen).
It’s important to note that partitioning data into multiple dimensions is not a replacement for proper indexing. For most developers, indexing on this dimension with a (dimension, time)
index is the biggest thing you can do to improve performance.
In terms of choosing the number of partitions, it’s usually related to the number of disks/tablespaces or servers that you want the parallelism across.
*We’ve released distributed (multi-node) TimescaleDB as a private beta to our community members and customers, and we plan to make an initial version of it more widely available later this year. It will support high write rates, query parallelism, and predicate push down for lower latency, among other capabilities.
- Contact us to learn how you can join the beta program.
- View our documentation to learn more about it works.
Do you have some info on how to do pattern matching on time-series data?
TimescaleDB supports tools and object-relationship mapping that enable you to perform pattern matching.
Because Timescale is built on Postgres, you can leverage R, using RStudio to connect to your Timescale database and operate more advanced “fuzzy matching” functions.
Additionally, you can run simple for
loops in Python (or any programming language) to identify pattern matches in your time-series datasets.
If you're looking to match strings within the data, Postgres offers an array of matching functions (see docs).
We’d also suggest reaching out via Slack, where we and passionate TimescaleDB community members and data experts are active in all channels.
Wrapping Up
That concludes this month’s installment of Your Questions Answered, but we’re always here to help - whether you’re just getting started with Timescale or a seasoned community member with a very specific use case.
Once again, you can find us at an upcoming Office Hours session, on Slack, or on Twitter.
We’re listening!