BigQuery

BigQuery is a cloud-based SQL data warehouse service developed by Google.

Features

FeatureSupported
Batch Mode
Stream Mode
Deduplication
Queries Optimization

Configuration

Advanced: Implementation Details

This section describes how Jitsu implements various modes and features for BigQuery.

Batch Mode

  • Jitsu collects events batches in tmp file on file system.
  • Using Loader API Jitsu loads data from tmp file into BigQuery tmp_table.
  • Using Copier API Jitsu copy data from tmp_table to target_table

Stream Mode

Not supported

It's possible to implement stream mode for BigQuery, but data Deduplication cannot be supported in this mode. So it is currently disabled in Jitsu.

Deduplication

Data deduplication in BigQuery is based on MERGE statement. Merge condition is based on primary key column configured on connection level.

Details
  • Write to tmp file
  • Deduplicate rows in tmp file
  • Use Loader API to load to tmp_table from tmp file
  • MERGE into target_table T using tmp_table TMP on T.pk_field=TMP.pk_field when matched then UPDATE ... when not matched them INSERT ...

Queries Optimization

Timestamp connection setting is used to optimize SELECT queries.

Jitsu creates time-unit column-partitioned table with specified timestamp column and daily partitioning.