Google Cloud Storage
Google Cloud Storage is a cloud file storage service by Google
Features
| Feature | Supported |
|---|---|
| Batch Mode | ✅ |
| Deduplication | ℹ️️ |
| Folder Macros | ✅ |
Configuration
Advanced: Implementation Details
This section describes how Jitsu implements various modes and features for Google Cloud Storage.
Batch Mode
Each batch run produces at least one file at GCS bucket with the following name format:
<folder>/<table_name>_<batch_start_time>_<file_number>.<file_format>file_number in case when number of available of events is greater than max batch size (default 10_000) Jitsu splits batch into multiple files. batch_number is a number of file in batch (starting from 1).
Deduplication
Deduplication is happening only in the context of a single batch. Jitsu doesn't guarantee deduplication across batches.
Organizing data into folders
You can use macros in Folder configuration parameter to organize data into folders. Macros are replaced with corresponding values during the batch run.
Supported macros:
| Macro | Description |
|---|---|
[DATE] | Date of the batch run in YYYY-MM-DD format |
[TIMESTAMP] | Batch run time in unix timestamp format |
You can use multiple macros in a single folder path.
For example, events/[DATE]/[TIMESTAMP] will create a folder with the current date and time.
Macros values are based on the batch start time and don't depend on timestamp of the events in the batch. So it is possible that events from different days will be placed into the one date folder.
See Accurate organization of data into folders for an example of how to organize data into folders based on event timestamps.
Accurate organization of data into folders
It is possible to use Functions to organize data into folders based on event timestamps.
Using functions it is possible to change the destination table for a particular event.
Table name is used as a prefix for batch file names in GCS.
Slashes (/) in file name works as directory separator and automatically creates corresponding directory structure in GCS bucket.
So it is possible to use functions to organize data into folders based on event timestamps or other event criteria.
Example:
export default async function(event, { log, fetch, props: config }) {
// Change destination table to <date>/events. E.g: 2023-01-01/events.
// After batch run GCS will contain folder 2023-01-01 with batch files inside.
const date = event.timestamp.split('T')[0];
event.JITSU_TABLE_NAME = `${date}/events`;
return event;
}