Projections
Projections store data in a format that optimizes query execution, this feature is useful for:
- Running queries on a column that is not a part of the primary key
- Pre-aggregating columns, it will reduce both computation and IO
You can define one or more projections for a table, and during the query analysis the projection with the least data to scan will be selected by ClickHouse without modifying the query provided by the user.
Projections will create internally a new hidden table, this means that more IO and space on disk will be required. Example, If the projection has defined a different primary key, all the data from the original table will be duplicated.
You can see more technical details about how projections work internally on this page.
Example filtering without using primary keys
Creating the table:
Using ALTER TABLE
, we could add the Projection to an existing table:
Inserting the data:
The Projection will allow us to filter by user_name
fast even if in the original Table user_name
was not defined as a PRIMARY_KEY
.
At query time ClickHouse determined that less data will be processed if the projection is used, as the data is ordered by user_name
.
To verify that a query is using the projection, we could review the system.query_log
table. On the projections
field we have the name of the projection used or empty if none has been used:
Example pre-aggregation query
Creating the table with the Projection:
Inserting the data:
We will execute a first query using GROUP BY
using the field user_agent
, this query will not use the projection defined as the pre-aggregation does not match.
To use the projection we could execute queries that select part of, or all of the pre-aggregation and GROUP BY
fields.
As mentioned before, we could review the system.query_log
table. On the projections
field we have the name of the projection used or empty if none has been used:
Manipulating Projections
The following operations with projections are available:
ADD PROJECTION
ALTER TABLE [db.]name [ON CLUSTER cluster] ADD PROJECTION [IF NOT EXISTS] name ( SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY] )
- Adds projection description to tables metadata.
DROP PROJECTION
ALTER TABLE [db.]name [ON CLUSTER cluster] DROP PROJECTION [IF EXISTS] name
- Removes projection description from tables metadata and deletes projection files from disk. Implemented as a mutation.
MATERIALIZE PROJECTION
ALTER TABLE [db.]table [ON CLUSTER cluster] MATERIALIZE PROJECTION [IF EXISTS] name [IN PARTITION partition_name]
- The query rebuilds the projection name
in the partition partition_name
. Implemented as a mutation.
CLEAR PROJECTION
ALTER TABLE [db.]table [ON CLUSTER cluster] CLEAR PROJECTION [IF EXISTS] name [IN PARTITION partition_name]
- Deletes projection files from disk without removing description. Implemented as a mutation.
The commands ADD
, DROP
and CLEAR
are lightweight in a sense that they only change metadata or remove files.
Also, they are replicated, syncing projections metadata via ClickHouse Keeper or ZooKeeper.
Projection manipulation is supported only for tables with *MergeTree
engine (including replicated variants).