What are common use cases for time-series databases?

Time-series databases support a wide range of applications including product analytics (tracking user interactions and behavior patterns), financial markets and trading (processing price changes and generating trading signals), system and application observability (monitoring server metrics and application performance), IoT and sensor data (managing readings from weather stations and industrial equipment), and business intelligence (analyzing trends and patterns over time). Each use case benefits from the ability to efficiently store and analyze data as it changes over time.

What characteristics should a time-series database have?

An effective time-series database needs several key characteristics: the ability to handle large data volumes that grow rapidly from high-frequency measurements, efficient write and query performance for continuous data ingestion and complex time-based analysis, efficient storage and compression using techniques like columnar storage and delta encoding, and support for time-based aggregations such as calculating daily averages or summing metrics over specific periods. Time-based indexing or the ability to sort data by timestamp during ingestion is also crucial for query performance.

What are the different types of time-series databases?

Time-series databases fall into three main categories: purpose-built time-series databases like InfluxDB, QuestDB, and Prometheus that are engineered specifically for time-stamped data with optimized temporal storage and specialized functions; extensions of relational databases like TimescaleDB that add time-series capabilities to PostgreSQL; and real-time analytics or columnar databases like Apache Pinot and ClickHouse that use columnar storage to enable fast aggregations and real-time processing of time-series information alongside broader analytical workloads.

What database should I use for time-series data?

The choice depends on your specific requirements. Purpose-built time-series databases offer specialized features like downsampling and gap filling. Relational database extensions provide familiarity with SQL while adding temporal capabilities. Real-time analytics databases like ClickHouse combine time-series performance with broader analytical flexibility. ClickHouse is particularly well-suited for organizations that need to handle time-series workloads alongside other analytical queries, providing sub-second query responses on billions of rows while supporting comprehensive date/time types, rich temporal functions, and efficient storage through compression.

What query capabilities do time-series databases offer?

Most time-series databases extend standard SQL with specialized temporal functions including time-based window functions for analyzing data over specific intervals, gap filling to handle missing data points, interpolation functions to estimate values between known points, time bucket operations for grouping data into regular intervals, and specialized mathematical functions for time-series analysis. Some databases like Prometheus use domain-specific languages (PromQL) designed specifically for time-series operations, while others like ClickHouse use SQL with comprehensive temporal function libraries.

How can I manage long-term time-series data retention efficiently?

Efficient long-term time-series data management typically involves strategies like data rollup (aggregating older data to reduce granularity), automated data retention policies, and tiered storage. ClickHouse offers materialized views for maintaining pre-aggregated data, aggregate states that enable flexible roll-up strategies, and Time-to-Live (TTL) functionality that automatically manages data retention and aggregation levels. This approach allows you to keep recent data at full granularity while automatically rolling up older data to save space.

What makes a good time-series database for IoT and sensor data?

A good time-series database for IoT needs to handle high write loads from continuous sensor readings, provide efficient storage through compression, and support time-based queries and aggregations. Key capabilities include optimized architecture for appending new data, columnar storage that groups similar values together, codecs like delta encoding for timestamps, and the ability to query years of historical sensor data while ingesting new measurements in real-time. This makes it suitable for weather monitoring, industrial IoT, and other sensor-intensive applications.

What's the difference between fixed interval and event-driven time-series data?

Fixed interval sampling captures data at consistent time points, providing a predictable and continuous stream of information. Examples include weather sensors reading every 10 seconds or heart rate monitors sampling at regular intervals. Event-driven data is captured irregularly, triggered by specific occurrences such as server errors, website clicks, or social media posts. Both are forms of time-series data because they represent changes over time, but fixed interval data is evenly distributed while event-driven data depends on when specific events occur.

What database supports both time-series and general analytical queries?

ClickHouse stands out as a versatile solution that handles time-series workloads alongside other analytical queries. While purpose-built time-series databases excel at temporal data, they can be limited for broader analytics needs. ClickHouse provides a comprehensive suite of temporal functions including time-based aggregations, date arithmetic, window functions, and time zone handling, while also supporting complex joins, aggregations, and general OLAP workloads. This makes ClickHouse ideal for organizations with diverse data analysis requirements beyond just time-series.

What's the best database for financial market and trading data?

Financial markets require databases that can handle massive volumes of time-series data with sub-second query response times. ClickHouse is well-suited for this use case, processing price updates, trades, and order book data at scale. Companies like Coinhall use ClickHouse to power blockchain data platforms that aggregate data from multiple networks and exchanges, processing millions of price updates daily while maintaining the low-latency performance needed for real-time trading decisions, technical analysis, and market monitoring.

How can I manage long-term time-series data retention efficiently?

ClickHouse offers sophisticated features for long-term time-series data management including materialized views for maintaining pre-aggregated data, aggregate states that enable flexible roll-up strategies, and Time-to-Live (TTL) functionality that automatically manages data retention and aggregation levels. This allows you to keep recent data at full granularity while automatically rolling up older data to save space, providing detailed recent data for operational needs and efficient storage of historical data for long-term analysis.

An intro to time-series databases | Engineering

Q: What is time-series data?

Time-series data describes datasets where observations are captured along a timeline with timestamps as a key feature. These data points can be collected at regular intervals (like sensor readings every 10 seconds) or irregular intervals when events occur unpredictably (like server error logs). Time-series data is defined by its ability to show how metrics change over time, whether through fixed interval sampling from weather sensors and heart rate monitors, or event-driven data from clickstreams and social media activity.

Q: Can ClickHouse handle real-time observability and monitoring data?

Yes, ClickHouse excels at handling observability and monitoring workloads. Its dual-layer architecture enables real-time querying of massive datasets with isolated concurrent operations, allowing inserts and selects to operate independently. ClickHouse processes billions of rows per second on standard hardware through vectorized query execution and parallel processing. Companies like Skool use ClickHouse to visualize real-time observability and monitor user behavior, while ClickHouse also offers compatibility with Prometheus through its experimental Time Series table engine.

Time-series data is everywhere in modern systems - from IoT sensors and financial markets to application monitoring and user analytics. As organizations collect more temporal data, they need efficient ways to store, process, and analyze it.

This article explores time-series databases, their use cases, and how different database solutions handle time-based data. Whether you're dealing with millions of sensor readings, tracking user behavior, or monitoring system performance, understanding your options for time-series data storage is crucial for building effective data systems.

To get a taste of time-series analysis in action, here’s a sample query that looks at average yearly precipitation across the UK, France, and the US using weather station data from NOAA. It's a simple example, but it shows how powerful time-based queries can be for uncovering trends over time.


1SELECT year,
2       avg(`precipitation`) AS `avg_precipitation`,
3       dictGet(`country`.`country_iso_codes`, 'name', code) as country
4FROM `noaa`.`noaa_v2`
5WHERE date > '1990-01-01' AND code IN ('UK', 'FR', 'US')
6GROUP BY toStartOfYear(`date`) AS `year`,
7         substring(station_id, 1, 2) as code
8HAVING avg_precipitation > 0         
9ORDER BY country, year ASC
10LIMIT 100000;

You can see more queries like this in the Is ClickHouse a time-series database? section.

What is time-series data? #

Let’s start by defining time-series data. It describes datasets where observations are captured along a timeline, and a key feature is a timestamp. These data points are collected at regular intervals—like every second, minute, or day—or irregular intervals when events occur unpredictably.

For example, while a weather sensor might collect readings at fixed intervals (e.g., every 10 seconds), an error log in a server system records data only when an error event happens, which can be at any time. Both are forms of time-series data because they represent data that changes over time, whether those changes are periodic or sporadic.

Collection methods #

Let’s look at the methods of data collection in a little more detail:

Fixed interval sampling: This method captures data at consistent time points, providing a predictable and continuous stream of information. Examples include weather sensors, heart rate monitors, and energy meters. Since the points are evenly distributed, this data is beneficial for seeing trends over specific periods.
Event-driven data: Time-series data can also be captured irregularly, triggered by specific occurrences. For instance, a server logs data each time a particular error occurs, which could happen multiple times in one hour or not for several hours. Other examples include clickstream data from websites (where clicks happen unpredictably) or social media posts, which depend entirely on user activity.

When analyzing time series data, we often slice or group it by different time periods to understand how it changes over time. This ability to analyze changes across time defines time series data—any data that changes in any way over time belongs to this category.

Example: Weather sensor data #

Below is an example of a message that a weather sensor might generate:

{
  "device_id": "sensor-12345",
  "timestamp": "2024-12-04T10:15:00Z",
  "location": {
    "latitude": 40.7128,
    "longitude": -74.0060,
    "altitude": 15.5
  },
  "metrics": {
    "temperature": 23.5,
    "humidity": 60,
    "pressure": 1013.25,
    "battery_level": 85,
    "signal_strength": -70
  },
  "status": {
    "operational": true,
    "last_maintenance": "2024-11-20T08:00:00Z"
  }
}

In this example, several metrics, such as temperature, humidity, and pressure are captured. These metrics continuously change, often at high frequency. Additionally, we have metadata like the device_id and location, which provide context for where and what is being measured but change much more frequently than metrics and perhaps not at all. There is also status information, which includes details like operational status and last_maintenance date.

Finally, each data point is associated with a timestamp, representing when the observation was taken. This timestamp is crucial for understanding how the metrics evolve, allowing us to detect patterns or trends.

What are good use cases for time-series data? #

Time-series databases can support a wide range of applications, each benefiting from the ability to store and analyze data as it changes over time. Let's explore some common use cases and the types of questions they help organizations answer.

Product analytics #

Product analytics generates rich time-series data through user interactions, system events, and transactions. Every click, page view, feature interaction, and purchase is timestamped, creating a detailed record of user behavior over time. This temporal data enables teams to answer crucial questions about their product: How do users navigate it? What paths lead to successful conversion? Which behaviors indicate potential churn? When do users typically discover and adopt new features?

The power of time-series analysis in product analytics lies in understanding not just what users do but when and in what sequence they do it. Teams can track user journeys through onboarding, measure time-to-conversion, analyze retention patterns, and identify features that drive engagement. By correlating these behaviors with other metrics like performance data, organizations can build a complete picture of their product's effectiveness and make data-driven decisions about product development.

➡️ Read more about building product analytics with ClickHouse

Financial Markets and Trading #

Financial markets, from traditional stock exchanges to cryptocurrency trading platforms, generate massive volumes of time-series data. Every price change, trade, and order book update must be captured and analyzed in real-time. This data is crucial for generating trading signals, performing technical analysis, and identifying market opportunities.

Time-series analysis is particularly important for creating standard trading tools like candlestick charts showing price movements over specific time intervals. These charts require rapid price data aggregation (open, high, low, close) over various time windows, from minutes to months. Traders also need to analyze market liquidity, calculate technical indicators, and simultaneously detect patterns across multiple assets or trading venues. Processing this data quickly is crucial - even small delays can mean missed trading opportunities or increased risk.

For example, cryptocurrency trading platforms must aggregate data from multiple blockchain networks and decentralized exchanges, processing millions of price updates daily while maintaining sub-second query response times. This enables traders to spot arbitrage opportunities, track market trends, and make real-time trading decisions.

➡️ Read more about how Coinhall uses ClickHouse to power its blockchain data platform

System and Application Observability #

Modern applications generate vast amounts of operational data that must be monitored in real time. From server metrics to user behavior, organizations need to track everything from system health to user experience. This observability data typically includes system metrics (CPU, memory, network), application telemetry (response times, error rates), and user interaction data.

Time-series analysis enables teams to visualize this data through real-time dashboards, track performance trends, and quickly identify issues. For example, teams can monitor application performance across different regions, track user engagement metrics, and analyze experiment results from A/B tests. The ability to correlate various metrics - from infrastructure health to user behavior - helps organizations understand how system performance impacts user experience and business outcomes.

By storing this data in a time-series database, teams can not only monitor the current system state but also analyze historical patterns, establish baselines, and detect anomalies that might indicate potential problems. This comprehensive view of system behavior is essential for maintaining reliable services and optimizing user experience.

➡️ Learn how Skool uses ClickHouse to visualize real-time observability and monitor user behavior.

Summary of use case characteristics #

Use Case	Write Volume	Query Pattern	Retention Needs	Cardinality
Product Analytics	High	Recent data + historical trends	90 days hot, years warm	Very high (user IDs, events)
Financial Trading	Very high	Real-time + historical analysis	Years at full resolution	High (symbols, exchanges)
Observability	Extreme	Real-time dashboards + troubleshooting	30 days hot, months warm	Extreme (services, hosts, metrics)

What is a time-series database? #

Now that we've defined time-series data and seen some examples, what does storing this data take? A time-series database (TSDB) is a database that can efficiently store, manage, and analyze time-series data.

Such a database will need to have the following characteristics:

Data volume - Time-series data grows rapidly due to the high frequency of measurements, such as sensor readings every second. Traditional databases can struggle to maintain performance when millions of data points are generated in short periods. Instead, we need a database that is optimized for appending new data.
Write and query performance - Time-series data involves frequent writes (inserting new data points continuously) and complex queries for analysis (e.g., aggregations over time). Our database needs efficient time-based indexing or the ability to sort data by timestamp during ingestion.
Efficient storage and compression - Since time series data often contains many repeated or very similar values, storing it efficiently is crucial. Column-based storage is an advantage here since values in the same column are stored next to each other. We’ll also want to use codecs that allow us to store deltas between values rather than the raw values each time. Delta encoding is one such codec that’s often used when storing timestamps.
Time-based aggregations - Time-series analysis typically involves time-based queries, such as calculating daily averages or summing metrics over weeks. We need to be able to run these types of queries over large volumes of data while also being able to filter by time period.
Ability to handle high cardinality - Time-series data often involves tracking metrics across many dimensions - thousands of servers, millions of IoT devices, or billions of user sessions, each with their own tags and labels. When you combine multiple dimensions (server + region + application + environment), the number of unique combinations (cardinality) explodes. A database needs efficient indexing and storage strategies to handle these high-cardinality dimensions without performance degradation. Some time-series databases impose strict cardinality limits that can become bottlenecks as data diversity grows, while others use specialized data structures to maintain performance even with millions of unique dimension combinations.

Many of these characteristics are the same as those required for real-time analytics databases.

Time-series databases vs transactional databases #

While traditional transactional databases like PostgreSQL or MySQL can store time-stamped data, they're optimized for different workloads than time-series analysis. Understanding these differences helps clarify when specialized time-series capabilities matter.

Transactional (OLTP) vs analytical workloads Transactional databases are designed for online transaction processing (OLTP)—handling frequent inserts, updates, and deletes with strong consistency guarantees. Think user accounts, order processing, or inventory management. Time-series databases (whether purpose-built or analytical databases like ClickHouse) are optimized for online analytical processing (OLAP)—handling high-volume append-only writes and fast aggregations over large datasets.

Write patterns OLTP databases balance read and write operations, supporting updates and deletes across rows with ACID guarantees. Time-series workloads are overwhelmingly append-only: new data points arrive continuously, but historical data rarely changes. This append-only pattern allows time-series databases to optimize storage and indexing strategies that would be inefficient in OLTP systems that need to handle updates.

Query patterns OLTP databases excel at retrieving or updating individual records or small sets of related records - "find this user's order" or "update this account balance." Time-series analysis typically involves scanning millions or billions of rows to compute aggregations: "what's the average response time over the last week?" or "show me the 95th percentile latency by region." These analytical queries require different optimization strategies.

Storage and compression OLTP databases use row-oriented storage, keeping all fields for a record together for fast retrieval and updates. Time-series databases typically use columnar storage, storing all values for a single metric together. This enables dramatic compression—sequential time-series values often change gradually, allowing delta encoding and other specialized compression techniques to achieve 10-100x compression ratios.

Scalability for time-series data As time-series data accumulates, OLTP databases can struggle with table sizes reaching billions of rows. Query performance degrades, and traditional indexing strategies become less effective. Databases optimized for analytical workloads handle these large datasets efficiently through partitioning, distributed query execution, and specialized data structures.

When to use each Use OLTP databases (PostgreSQL, MySQL) for transactional workloads requiring updates, deletes, strong consistency, and complex relational integrity. Use time-series optimized databases for high-volume temporal data requiring fast aggregations and analytical queries. Many organizations run both: OLTP databases for core application data and analytical databases for metrics, logs, and time-series analysis.

Extensions like TimescaleDB bridge this gap by adding time-series optimizations to PostgreSQL, while analytical databases like ClickHouse handle both time-series and other analytical workloads in a single system.

When should you use a time-series database? #

The decision to move from a transactional database to a time-series optimized solution typically comes when your workload characteristics change. Let's have a look at some key indicators:

Data volume is overwhelming #

When your time-series tables contain billions of rows and continue growing rapidly, traditional OLTP databases struggle. Table sizes exceeding hundreds of gigabytes often result in degraded query performance, slower writes, and increasingly complex maintenance operations, such as vacuuming or index rebuilding.

Query patterns favor analytical operations #

If most of your queries scan large portions of your dataset rather than looking up individual rows, you've outgrown OLTP optimization. Typical signs include:

Queries that aggregate millions of rows (daily/weekly summaries, averages, percentiles)
Queries that only touch a few columns from wide tables
Time-range scans that process data from specific periods
Minimal use of UPDATE or DELETE operations - your data is primarily append-only

Performance requirements exceed OLTP capabilities #

When users expect sub-second responses for queries scanning millions of rows, or when concurrent analytical queries impact your transactional workload, it's time to separate concerns. Real-time dashboards requiring fresh data with low latency are particularly challenging for transactional databases.

High cardinality becomes problematic #

As you track more unique dimensions (device IDs, user IDs, tags, labels), the number of unique combinations explodes. Traditional database indexes become inefficient, and query performance degrades despite proper indexing strategies.

Storage costs escalate #

Time-series data in row-oriented OLTP databases consumes significantly more storage than in columnar systems optimized for compression. When storage costs become a concern or when implementing complex archival strategies to manage growth, specialized time-series storage offers better economics.

ACID guarantees aren't essential #

If eventual consistency is acceptable for your time-series data - meaning you can tolerate brief delays between writes and reads, and don't require transactional guarantees across multiple tables - you can benefit from the performance advantages of time-series optimized systems.

When to stay with your transactional database #

Stick with PostgreSQL or MySQL when:

Your time-series data volume remains manageable (millions, not billions of rows)
You frequently update or delete historical data points
You need strong transactional consistency with other application data
Your queries primarily look up individual records or small ranges
Time-series analysis is a small part of a larger transactional application

As mentioned in the previous section, many organizations run both systems: transactional databases for core application data that require ACID guarantees, and time-series-optimized databases for metrics, logs, and analytical workloads. This separation enables each system to excel in its respective area of expertise.

What are the most popular time-series databases? #

Time-series databases can be categorized into three main types: purpose-built time-series databases, extensions of other databases, and real-time analytics/column-based databases. Here are some popular examples:

Database	Type	Best For	Query Language	Key Strength
InfluxDB	Purpose-built TSDB	IoT, DevOps metrics, real-time monitoring	InfluxQL, Flux, SQL	Downsampling and data retention policies
QuestDB	Purpose-built TSDB	High-throughput ingestion, fast SQL queries	SQL	Fast writes with low-latency queries
Prometheus	Purpose-built TSDB	System and service monitoring, alerting	PromQL	Pull-based metrics collection and alerting
TimescaleDB	PostgreSQL extension	Hybrid relational + time-series workloads	SQL (PostgreSQL)	Familiar PostgreSQL ecosystem and tooling
Apache Pinot	Real-time analytics	User-facing dashboards, clickstream analysis	SQL	Sub-second query response times
ClickHouse	Real-time analytics	Observability, large-scale analytics	SQL	Extreme performance and analytical flexibility

Purpose-built time-series databases #

Specialized databases engineered from the ground up to efficiently handle time-stamped data, offering optimized temporal data storage and query mechanisms, including as of joins, specialized maths functions, downsampling, grouped gap filling, and more. Some examples are described below:

InfluxDB - Specifically designed for time-series data, InfluxDB handles high write loads and provides features like downsampling and data retention policies, making it great for IoT, DevOps metrics, and real-time monitoring.
QuestDB - A high-performance open-source time-series database that excels at fast SQL queries and high-throughput ingestion.
Prometheus - An open-source monitoring system primarily designed for system and service monitoring. Prometheus excels at scraping metrics from various endpoints, storing them efficiently, and enabling alerting based on those metrics. It’s well-suited for use cases like server health monitoring and application performance metrics.

Extensions of relational databases #

Traditional relational databases can enhanced with time-series capabilities, combining the familiarity and flexibility of SQL with specialized temporal features.

TimescaleDB is an extension of PostgreSQL. TimescaleDB adds time-based features like automatic partitioning and time-based indexing. It's perfect for scenarios where you want to blend relational data with time-series data, such as in business intelligence or IoT.

Real-Time analytics / column-based Databases #

Systems optimized for rapid large-scale data analysis, using columnar storage to enable fast aggregations and real-time processing of time-series information. Some examples are described below:

Apache Pinot - Designed for real-time, low-latency analytics, Pinot is well-suited for applications like user-facing dashboards or clickstream analysis, offering sub-second query response times.
ClickHouse - That’d be us! ClickHouse was initially designed to keep records of all clicks by people from all over the Internet but is now used for various time-centric datasets, with a particular focus on observability.

Querying time-series data #

Time-series databases offer specialized query capabilities designed to handle temporal data efficiently. While many modern time-series databases use SQL with extensions, some have developed their own query languages optimized for time-series operations.

Query language approaches #

Most time-series databases extend standard SQL with specialized functions for temporal analysis. These extensions typically include:

Time-based window functions for analyzing data over specific time intervals
Gap filling to handle missing data points
Interpolation functions to estimate values between known data points
Time bucket operations for grouping data into regular time intervals
Specialized mathematical functions for time-series analysis

Prometheus uses a domain-specific query language called PromQL, specifically designed for time-series analysis and monitoring use cases. PromQL has built-in support for rate calculations and aggregations over time, native handling of labels and label matching, and vector and range vector selectors. It's particularly well-suited for monitoring scenarios where you must analyze metrics over time windows and create alerting rules.

InfluxDB initially used a query language called InfluxQL, which was SQL-like but explicitly designed for time-series operations. With InfluxDB 2.0, they introduced Flux, a more powerful SQL-based language, before adding SQL support to make the platform more accessible to users familiar with traditional database querying.

Query patterns and performance requirements #

Different types of time-series queries have varying performance requirements based on their use case:

Query Type	Frequency	Latency Requirement	Example
Real-time dashboards	Continuous (every 1-5s)	<1 second	Current system health, live metrics
Historical analysis	Ad-hoc	1-5 seconds	"Last month's trends", quarterly reports
Alerting queries	Scheduled (every 10-60s)	<1 second	Threshold violations, anomaly detection
Downsampling/aggregation	Background (nightly/hourly)	Minutes acceptable	Pre-computing hourly/daily summaries
Forensic analysis	Rare	5-30 seconds	Root cause analysis, incident investigation

Understanding these requirements helps in selecting the right database and optimizing query patterns for your specific use case.

Data retention and lifecycle management #

As time-series data accumulates, organizations face a fundamental challenge: data grows indefinitely while query patterns typically focus on recent information. A monitoring dashboard might primarily display the last 24 hours of metrics, yet the system continues ingesting data every second. Without a management strategy, storage costs escalate while query performance degrades as tables grow to billions or trillions of rows.

Organizations address this through three main approaches: data expiration, storage tiering, and data rollup. Each solves the same core problem - managing the lifecycle of time-series data - but with different trade-offs between cost, accessibility, and data granularity.

Comparing retention strategies:

Strategy	Storage Cost	Query Speed	Data Loss	Complexity	Best For
Data expiration	Lowest (data deleted)	N/A (data gone)	Complete	Low	Short-term logs, temporary metrics
Storage tiering	Medium (cheaper storage)	Slower for old data	None	Medium	Compliance, auditing, long-term trends
Data rollup	Low (aggregates only)	Fast (smaller data)	Precision loss	Medium-High	Historical analysis, trend monitoring
Combination approach	Optimized	Varies by tier	Partial (for rollups)	High	Most production systems

Data expiration #

Data expiration automatically deletes data after a specified time period has elapsed. This is the most straightforward approach: define a retention policy (e.g., "keep data for 90 days"), and the database automatically removes older data. The advantage is straightforward implementation and genuine storage savings since the data disappears entirely. However, it's also the most destructive approach—once data is deleted, it's gone. This approach works well when historical data has no long-term value, such as short-lived application logs or metrics that are only relevant for immediate troubleshooting.

Storage tiering #

Storage tiering moves data through different storage layers based on age rather than deleting it. Recent "hot" data is stored on fast SSDs for quick queries, while older "warm" data migrates to less expensive storage, such as object stores (S3, GCS), and the oldest "cold" data may be moved to archival systems. This approach preserves all data while optimizing costs - you're not paying premium storage prices for data that's rarely accessed. The trade-off is complexity: queries spanning multiple tiers may be slower, and you still incur storage costs (albeit at lower rates). Storage tiering shines when you need occasional access to historical data for compliance, auditing, or long-term trend analysis.

Data rollup #

Data rollup (or downsampling) replaces high-resolution raw data with pre-computed aggregations. For example, keep per-second metrics for 7 days, then replace them with per-minute averages for 90 days, then per-hour averages indefinitely. This dramatically reduces storage requirements—a year of per-second data becomes manageable when aggregated to hourly granularity. The significant trade-off is loss of detail: once you've aggregated second-level data to minutes, you can't recover that original precision. Rollup works best when you can anticipate your analysis needs—if you know you'll only ever need hourly trends for historical data, rolling up makes perfect sense.

Implementation approaches #

Different time-series databases implement these strategies at various levels. Some apply retention policies at the database level, providing simplicity but less granularity - all data follows the same rules. Others allow table-level or even column-level policies, giving you precise control at the cost of increased configuration complexity. Some databases automate rollup processes, while others require you to define and maintain aggregation pipelines explicitly.

The best strategy often combines multiple approaches: expire truly transient data, tier important data to less expensive storage, and roll up metrics where aggregations are sufficient. The key is understanding your query patterns and data value over time - recent data may require millisecond precision on fast storage, while year-old data may serve your needs perfectly as hourly aggregates in object storage.

Is ClickHouse a time-series database? #

While ClickHouse isn't specifically designed as a time-series database, it excels at handling time-series workloads as part of its broader analytical capabilities.

As a columnar OLAP database, ClickHouse provides the performance and features needed for efficient time-series analysis without the limitations of a specialized solution.

ClickHouse's strengths in handling time-series data come from several key capabilities.

Real-time querying of large datasets #

ClickHouse enables the analysis of historical and current data at a large scale through its innovative dual-layer architecture. The system processes billions of rows per second on standard hardware through:

Isolated concurrent operations: Data is organized into "table parts" that allow inserts and selects to operate independently without blocking each other
Vectorized query execution: Processes data in batches rather than row-by-row, utilizing CPU caches efficiently and applying SIMD instructions
Parallel processing: Automatically distributes query execution across multiple CPU cores and can scale horizontally across nodes in a cluster
Merge-time computation: Shifts computational work from query time to background merge processes, making queries significantly faster
Specialized algorithms and data structures: As noted by CMU Professor Andy Pavlo, ClickHouse has "20 versions of a hash table" and other specialized components optimized for different query patterns

You can read more in the Why is ClickHouse fast? Developer guide.

These architectural advantages enable organizations to maintain years of historical time-series data while providing sub-second query responses for real-time dashboards and deep historical analysis.

The following query analyzes New York City taxi data that contains over 3 billion records. For January 1, 2014, it groups rides by hour and cab type to show the number of rides, average trip distance, and average fare for each hourly period and taxi category.


1SELECT 
2    toStartOfHour(pickup_datetime) AS hour,
3    cab_type,
4    count(*) AS rides,
5    round(avg(trip_distance), 2) AS avg_distance,
6    round(avg(total_amount), 2) AS avg_fare
7FROM nyc_taxi.trips
8WHERE pickup_date = '2014-01-01'
9GROUP BY 1, 2
10ORDER BY 1, 2

Comprehensive date/time type support #

ClickHouse provides robust support for time-series data through specialized date and time data types that balance storage efficiency with precision requirements:

Versatile date types:
- Date: Compact 2-byte storage covering [1970-01-01, 2149-06-06], sufficient for most use cases
- Date32: Extended 4-byte storage covering a wider range [1900-01-01, 2299-12-31]
Flexible timestamp types:
- DateTime: 4-byte storage with second precision, range of [1970-01-01 00:00:00, 2106-02-07 06:28:15]
- DateTime64: 8-byte storage with configurable sub-second precision (up to nanoseconds), range of [1900-01-01 00:00:00, 2299-12-31 23:59:59.99999999]
Time zone awareness:
- Built-in support for time zones in both DateTime('TimeZone') and DateTime64('TimeZone')
- Automatic time zone conversion during queries
- Support for different time zones within the same table
Type conversion functions:
- Seamless conversion between temporal types with functions like toDate, toDateTime, and toDateTime64
- Precision control when converting between different temporal resolutions

These comprehensive date/time capabilities provide the foundation for sophisticated time-series analysis, enabling precise temporal storage and manipulation across massive datasets while optimizing storage efficiency and query performance.

The following query aggregates total daily hits from the Wiki dataset, using the toDate function to convert DateTime values to Date:


1SELECT
2    sum(hits) AS h,
3    toDate(time) AS d
4FROM wiki.wikistat_small
5GROUP BY d
6ORDER BY d
7LIMIT 5;

Rich set of temporal functions #

ClickHouse offers a comprehensive suite of temporal functions that can be used for time-series analysis:

Time-based aggregations: Functions like toStartOfHour, toStartOfMonth, and toStartOfInterval enable efficient grouping of time-series data into regular intervals
Date and time arithmetic: Functions for adding or subtracting intervals from timestamps with addDays, addHours, and more.
Date/time formatting: Flexible formatting options with formatDateTime and parseDateTime for input/output operations
Time difference calculations: Functions like dateDiff for measuring intervals between timestamps.
Time zone handling: Comprehensive support for time zone conversions with functions like toTimeZone and automatic time zone awareness
Standard window functions: Full support for SQL window functions, including:
- Row numbering with row_number
- Ranking with rank, dense_rank, and percent_rank
- Value access with first_value, last_value, and nth_value
- Frame navigation with lagInFrame and leadInFrame

These temporal functions allow analysts to perform sophisticated time-series analyses with concise, readable SQL queries. With ClickHouse's query performance, these functions enable complex time-based aggregations, pattern detection, and anomaly identification across massive datasets with minimal latency.

Query examples #

Let’s have a look at a couple of examples.

The following query computes the yearly average precipitation in the UK, France, and the US from 1990 onwards.


1SELECT year,
2       avg(`precipitation`) AS `avg_precipitation`,
3       dictGet(`country`.`country_iso_codes`, 'name', code) as country
4FROM `noaa`.`noaa_v2`
5WHERE date > '1990-01-01' AND code IN ('UK', 'FR', 'US')
6GROUP BY toStartOfYear(`date`) AS `year`,
7         substring(station_id, 1, 2) as code
8HAVING avg_precipitation > 0         
9ORDER BY country, year ASC
10LIMIT 100000;

The following query uses a window function to calculate the cumulative stars of the deepseek-ai/DeepSeek-R1 repository:


1SELECT toDate(created_at) AS day, 
2       count() AS dailyCount,
3       sum(dailyCount) OVER (ORDER BY day ASC) AS culStars
4FROM github.events 
5WHERE event_type = 'WatchEvent' AND repo_name = 'deepseek-ai/DeepSeek-R1'
6GROUP BY ALL
7ORDER BY day;

Long-term data management #

Additionally, ClickHouse offers features that are particularly valuable for long-term time-series data management:

Materialized views for maintaining pre-aggregated data efficiently
Support for aggregate states that enable flexible roll-up strategies
Time-to-live (TTL) functionality that can automatically manage data retention and aggregation levels

These capabilities mean you can implement sophisticated time-series storage strategies, such as keeping recent data at full granularity while automatically rolling up older data to save space. This approach provides both detailed recent data for operational needs and efficient storage of historical data for long-term analysis.

Prometheus compatibility #

ClickHouse also expands its time-series capabilities with experimental features like the Time Series table engine, which can be a backing store for Prometheus data. This allows organizations to leverage ClickHouse's analytical capabilities while maintaining compatibility with popular time-series monitoring tools.

Rather than being limited to time-series-specific functionality, ClickHouse allows you to handle time-series workloads alongside other analytical queries, providing a more versatile solution for organizations with diverse data analysis needs.