Snowflake
Snowflake is an independent cloud data warehouse with compute-based pricing.
Features
| Feature | Supported |
|---|---|
| Batch Mode | ✅ |
| Stream Mode | ✅ |
| Deduplication | ✅ |
| Queries Optimization | ✅ |
Configuration
Advanced: Implementation Details
Batch Mode
User Stages is used as a intermediate storage for batched events.
Algorithm
-- Write to tmp file
-- Load tmp file to user stage
BEGIN -- start transaction
COPY from stage to tmp_table
INSERT into target_table select from tmp_table
COMMIT -- commit transaction
Stream Mode
INSERT INTO target_table (...) VALUES (..)
Deduplication
For batch mode the following algorithm is used:
Algorithm
-- Write to tmp file
-- Deduplicate rows in tmp file
-- Load tmp file to user stage
BEGIN -- start transaction
COPY from stage to tmp_table
MERGE into target_table using (select from tmp_table) ...
COMMIT -- commit transaction
For stream mode:
- SELECT by primary key
- Either INSERT or UPDATE depending on result
Queries Optimization
Timestamp connection setting is used to optimize SELECT queries.
Jitsu sets clustering key to the month part of specified timestamp column values, e.g. CLUSTER BY (DATE_TRUNC('MONTH', _timestamp))