Snowflake

Snowflake is an independent cloud data warehouse with compute-based pricing.

Features

FeatureSupported
Batch Mode
Stream Mode
Deduplication
Queries Optimization

Configuration

Advanced: Implementation Details

Batch Mode

User Stages is used as a intermediate storage for batched events.

Algorithm
-- Write to tmp file
-- Load tmp file to user stage
BEGIN -- start transaction
COPY from stage to tmp_table
INSERT into target_table select from tmp_table
COMMIT -- commit transaction
 

Stream Mode

INSERT INTO target_table (...) VALUES (..)

Deduplication

For batch mode the following algorithm is used:

Algorithm
-- Write to tmp file
-- Deduplicate rows in tmp file
-- Load tmp file to user stage
BEGIN -- start transaction
COPY from stage to tmp_table
MERGE into target_table using (select from tmp_table) ...
COMMIT -- commit transaction
 

For stream mode:

  • SELECT by primary key
  • Either INSERT or UPDATE depending on result

Queries Optimization

Timestamp connection setting is used to optimize SELECT queries.

Jitsu sets clustering key to the month part of specified timestamp column values, e.g. CLUSTER BY (DATE_TRUNC('MONTH', _timestamp))