<Header icon={<img src="/docs/logos/s3.svg" className="w-9 h-9 inline" />}>S3</Header>

S3 is a cloud file storage service by Amazon

## Features

| Feature                                        | Supported |
|------------------------------------------------|-----------|
| [Batch Mode](#batch-mode)                      | ✅         |
| [Deduplication](#deduplication)                | ℹ️️       |
| [Folder Macros](#organizing-data-into-folders) | ✅         |

## Configuration

Jitsu supports both Access Key based authentication and IAM Role based authentication for Redshift data warehouse.

### General parameters

| Parameter name            | Description                                                                                                                                       |
|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| **Authentication Method** | `accessKey` - Access Key based authentication, `iam` - IAM Role based authentication                                                              |
| **S3 Region**             | AWS Region of S3 bucket                                                                                                                           |
| **S3 Bucket Name**        | S3 Bucket Name                                                                                                                                    |
| **Folder**                | Folder in the block storage bucket where files will be stored                                                                                     |
| **Format**                | Format of the files stored in the block storage: `ndjson` - Newline Delimited JSON, `ndjson_flat` - Newline Delimited JSON flattened, `csv` - CSV |
| **Compression**           | Compression algorithm used for the files stored in the block storage: `gzip` - GZIP, `none` - no compression.                                     |

Configuration settings depend on the selected authentication method.

### Access Key based authentication

| Parameter name           | Description                                                                                                                                       |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| **S3 Access Key Id**     | S3 Access Key Id.                                                                                                                                 |
| **S3 Secret Access Key** | S3 Secret Access Key                                                                                                                              |
| **Endpoint**             | Custom endpoint of S3-compatible server (Optional)                                                                                                |

### IAM Role based authentication

| Parameter name          | Description                                             |
|-------------------------|---------------------------------------------------------| 
| **Role ARN**            | IAM role ARN                                            |

To setup IAM Role based authentication for S3, follow the [Advanced: IAM Role for Jitsu](#advanced-iam-role-for-jitsu) section.

## Advanced: IAM Role for Jitsu

To allow Jitsu to connect to S3 using IAM Role, the following steps should be performed in AWS Console:

* Create a new IAM Policy
* Create a new IAM Role

### Create a new IAM Policy

* Sign in to your AWS Management Console and open the [IAM console](https://console.aws.amazon.com/iam/).
* Go to Policies > Create policy.
* Choose the JSON option. Then, paste the JSON below
* Assign a unique and descriptive name to the policy, provide a clear description, and then select Create Policy.

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${S3BucketName}",
        "arn:aws:s3:::${S3BucketName}/*"
      ]
    }
  ]
}
```

:::tip

Make Sure to replace `${S3BucketName}` macros with the value from the Configuration section

:::

### Create a new IAM Role

* Sign in to your AWS Management Console and open the [IAM console](https://console.aws.amazon.com/iam/).
* Go to Roles > Create role.
* Under **Trusted entity type**, select **Custom trust policy**.
* Paste the JSON below into the **Custom trust policy** field and replace `${WorkspaceId}` macro with your Jitsu **Workspace ID** (Jitsu UI -> Settings -> Workspace Settings).
* In the policy selection screen, find and check the policy created in the [Create policy](#create-a-new-iam-policy) section.
* Assign a unique and descriptive name to the role, provide a clear description, and then select Create role.
* Find the newly created role in the list and click on it.
* Copy the **ARN** value from the **Summary** section and use in Jitsu S3 Configuration.

**Custom trust policy:**
```json
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::907458119157:root"
			},
			"Action": "sts:AssumeRole",
			"Condition": {
				"StringEquals": {
					"sts:ExternalId": "${WorkspaceId}"
				}
			}
		}
	]
}
```

:::info
`907458119157` - Jitsu AWS Account Id
:::

## Advanced: Implementation Details

This section describes how Jitsu implements various modes and features for S3.

### Batch Mode

Each batch run produces at least one file at s3 bucket with the following name format:

```
<folder>/<table_name>_<batch_start_time>_<file_number>.<file_format>
```
`file_number` in case when number of available of events is greater than max batch size (default 10_000) Jitsu splits batch into multiple files. `batch_number` is a number of file in batch (starting from 1).

### Deduplication

:::info

Deduplication is happening only in the context of a single batch. Jitsu doesn't guarantee deduplication across batches.

:::

### Organizing data into folders

You can use macros in **Folder** configuration parameter to organize data into folders. Macros are replaced with corresponding values during the batch run.

Supported macros:

| Macro         | Description                                  |
|---------------|----------------------------------------------|
| `[DATE]`      | Date of the batch run in `YYYY-MM-DD` format |
| `[TIMESTAMP]` | Batch run time in unix timestamp format      |

You can use multiple macros in a single folder path. 
For example, `events/[DATE]/[TIMESTAMP]` will create a folder with the current date and time.

:::note

Macros values are based on the batch start time and don't depend on timestamp of the events in the batch.
So it is possible that events from different days will be placed into the one date folder.

See [Accurate organization of data into folders](#accurate-organization-of-data-into-folders) for an example of how to organize data into folders based on event timestamps.
:::

#### Accurate organization of data into folders

It is possible to use **Functions** to organize data into folders based on event timestamps.

Using functions it is possible to [change the destination table](/docs/functions/#change-destination-table) for a particular event.

Table name is used as a prefix for batch file names in S3. 
Slashes (`/`) in file name works as directory separator and automatically creates corresponding directory structure in S3 bucket.
So it is possible to use functions to organize data into folders based on event timestamps or other event criteria.

Example:

```javascript
export default async function(event, { log, fetch, props: config }) {
  // Change destination table to <date>/events. E.g: 2023-01-01/events.
  // After batch run S3 will contain folder 2023-01-01 with batch files inside.
  const date = event.timestamp.split('T')[0];
  event.JITSU_TABLE_NAME = `${date}/events`;
  return event;
}
```