<Header icon={<img src="/docs/logos/clickhouse.svg" className="w-9 h-9 inline" />}>ClickHouse</Header>

ClickHouse is an open-source column-oriented database management system specialized for online 
analytical processing of queries (OLAP). There's two ways to use ClickHouse:
 * [Clickhouse Cloud](https://clickhouse.com/cloud)
 * [Self-hosted Clickhouse](https://clickhouse.com/docs/en/install)


## Features

| Feature                                       | Supported                      |
|-----------------------------------------------|--------------------------------|
| [Batch Mode](#batch-mode)                     | ✅                              |
| [Stream Mode](#stream-mode)                   | ❌ <sup>[*](#stream-mode)</sup> |
| [Deduplication](#deduplication)               | ✅                              |
| [Queries Optimization](#queries-optimization) | ✅                              |
| [Cluster Support](#cluster-support)           | ✅                              |
| [Distributed Tables](#distributed-tables)     | ✅                              |

## Configuration

<DestinationConfiguration type="clickhouse" />

## Advanced: Implementation Details

This section describes how Jitsu implements various modes and features for ClickHouse

### Batch Mode

In Batch Mode Jitsu use temporary table with `Memory` engine to store batch events before they are inserted into the destination table.

<details>
<summary>Details</summary>

```sql
-- Jitsu collects events batches in tmp file on file system.
INSERT INTO tmp_table (...) VALUES (...); -- bulk load data from tmp file into tmp_table using prepared statement in transaction
INSERT INTO target_table(...) SELECT ... FROM tmp_table
```
</details>

### Stream Mode

:::danger[Do not use in production]
Stream mode in ClickHouse is [limited by MergeTree engine capabilities](https://clickhouse.com/docs/knowledgebase/exception-too-many-parts).
ClickHouse is not designed to handle a large number of individual inserts.

Stream Mode is not available by default and can be enabled only in advanced mode. Use it only for testing purposes.
:::

### Deduplication

Jitsu relies on underlying table engine. By default `ReplacingMergeTree` will be used.

**Primary key** column configured on connection level will be used both as `PRIMARY KEY` and `ORDER BY`.

ReplacingMergeTree engine performs deduplication by primary key in background during some time after insertion So it's still possible to get rows with duplicated primary key columns using ordinary SELECT.

To make sure that no duplicates are present in query results use `FINAL` modifier, e.g:

```sql
SELECT * FROM target_table FINAL
```

:::tip
`ReplacingMergeTree` is not only way to deduplicate data in ClickHouse. There [other approaches](https://kb.altinity.com/altinity-kb-schema-design/row-level-deduplication/) too. To implement them, create destination table before Jitsu starts inserting the data. In this case Jitsu will respect table engine and primary key columns you specified.
:::

### Queries Optimization

**Timestamp** connection setting is used to optimize SELECT queries.
Jitsu creates tables [partitioned](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key) by specified timestamp column and monthly partitioning, e.g. `PARTITION BY toYYYYMM(_timestamp)`

### Cluster Support

Jitsu supports both ClickHouse clusters and single node instances. To use cluster, specify cluster name in `Cluster` connection setting.
When working in cluster mode, Jitsu creates all tables in [Replicated](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication) mode
and all DDL operations are performed `ON CLUSTER`.

### Distributed Tables

When working in cluster mode, Jitsu creates destination tables as [Distributed](https://clickhouse.com/docs/en/engines/table-engines/special/distributed/) tables.
On each cluster node Jitsu creates local table with the `local_` prefix that contains actual data.