Google Cloud Storage

Google Cloud Storage is a cloud file storage service by Google

Features

FeatureSupported
Batch Mode
Deduplicationℹ️️
Folder Macros

Configuration

Advanced: Implementation Details

This section describes how Jitsu implements various modes and features for Google Cloud Storage.

Batch Mode

Each batch run produces at least one file at GCS bucket with the following name format:

<folder>/<table_name>_<batch_start_time>_<file_number>.<file_format>

file_number in case when number of available of events is greater than max batch size (default 10_000) Jitsu splits batch into multiple files. batch_number is a number of file in batch (starting from 1).

Deduplication

Info

Deduplication is happening only in the context of a single batch. Jitsu doesn't guarantee deduplication across batches.

Organizing data into folders

You can use macros in Folder configuration parameter to organize data into folders. Macros are replaced with corresponding values during the batch run.

Supported macros:

MacroDescription
[DATE]Date of the batch run in YYYY-MM-DD format
[TIMESTAMP]Batch run time in unix timestamp format

You can use multiple macros in a single folder path. For example, events/[DATE]/[TIMESTAMP] will create a folder with the current date and time.

Note

Macros values are based on the batch start time and don't depend on timestamp of the events in the batch. So it is possible that events from different days will be placed into the one date folder.

See Accurate organization of data into folders for an example of how to organize data into folders based on event timestamps.

Accurate organization of data into folders

It is possible to use Functions to organize data into folders based on event timestamps.

Using functions it is possible to change the destination table for a particular event.

Table name is used as a prefix for batch file names in GCS. Slashes (/) in file name works as directory separator and automatically creates corresponding directory structure in GCS bucket. So it is possible to use functions to organize data into folders based on event timestamps or other event criteria.

Example:

export default async function(event, { log, fetch, props: config }) {
  // Change destination table to <date>/events. E.g: 2023-01-01/events.
  // After batch run GCS will contain folder 2023-01-01 with batch files inside.
  const date = event.timestamp.split('T')[0];
  event.JITSU_TABLE_NAME = `${date}/events`;
  return event;
}