BigQuery
BigQuery is a cloud-based SQL data warehouse service developed by Google.
Features
| Feature | Supported |
|---|---|
| Batch Mode | ✅ |
| Stream Mode | ❌ |
| Deduplication | ✅ |
| Queries Optimization | ✅ |
Configuration
Advanced: Implementation Details
This section describes how Jitsu implements various modes and features for BigQuery.
Batch Mode
- Jitsu collects events batches in tmp file on file system.
- Using Loader API Jitsu loads data from tmp file into BigQuery tmp_table.
- Using Copier API Jitsu copy data from tmp_table to target_table
Stream Mode
Not supported
It's possible to implement stream mode for BigQuery, but data Deduplication cannot be supported in this mode. So it is currently disabled in Jitsu.
Deduplication
Data deduplication in BigQuery is based on MERGE statement. Merge condition is based on primary key column configured on connection level.
Details
- Write to tmp file
- Deduplicate rows in tmp file
- Use Loader API to load to tmp_table from tmp file
MERGE into target_table T using tmp_table TMP on T.pk_field=TMP.pk_field when matched then UPDATE ... when not matched them INSERT ...
Queries Optimization
Timestamp connection setting is used to optimize SELECT queries.
Jitsu creates time-unit column-partitioned table with specified timestamp column and daily partitioning.