<Header icon={<img src="/docs/logos/redshift.svg" className="w-9 h-9 inline" />}>Redshift</Header>

Amazon Redshift is a cloud data warehouse that is optimized for the analytical workloads of business intelligence (BI) and data warehousing (DWH). Jitsu supports both Serverless and Classic Redshift

## Features

| Feature                                       | Supported                          |
|-----------------------------------------------|------------------------------------|
| [Batch Mode](#batch-mode)                     | ✅                                  |
| [Stream Mode](#stream-mode)                   | ⚠️ <sup>[slow](#stream-mode)</sup> |
| [Deduplication](#deduplication)               | ✅                                  |
| [Queries Optimization](#queries-optimization) | ✅                                  |

## Configuration

Jitsu supports both database username/password based authentication and IAM Role based authentication for Redshift data warehouse.

### General parameters

| Parameter name            | Description                                                                                |
|---------------------------|--------------------------------------------------------------------------------------------|
| **Authentication Method** | `password` - Username/Password based authentication, `iam` - IAM Role based authentication |
| **Region**                | AWS Region of Redshift Cluster and S3 bucket                                               |
| **Database**              | Redshift database name                                                                     |
| **Schema**                | Redshift schema                                                                            |
| **S3 Bucket Name**        | S3 Bucket Name                                                                             |

Configuration settings depend on the selected authentication method.

### Username/Password based authentication

| Parameter name           | Description                                 |
|--------------------------|---------------------------------------------|
| **Host**                 | Redshift public endpoint                    |
| **Username**             | Redshift username                           |
| **Password**             | Redshift password                           |
| **S3 Access Key Id**     | S3 Access Key Id                            |
| **S3 Secret Access Key** | S3 Secret Access Key                        |

### IAM Role based authentication

| Parameter name          | Description                                             |
|-------------------------|---------------------------------------------------------|
| **Redshift Serverless** | `true` if connecting to Redshift Serverless instance    |
| **Cluster Identifier**  | Redshift cluster identifier (**Redshift Cluster** only) |
| **Workgroup name**      | Redshift workgroup name (**Redshift Serverless** only)  |
| **Role ARN**            | IAM role ARN                                            |
| **Username**            | Redshift username (**Redshift Cluster** only)           |

To setup IAM Role based authentication for Redshift, follow the [Advanced: IAM Role for Jitsu](#advanced-iam-role-for-jitsu) section.

## Advanced: IAM Role for Jitsu

To allow Jitsu to connect to Redshift using IAM Role, the following steps should be performed in AWS Console:

* Create a new IAM Policy 
* Create a new IAM Role
* Attach the new IAM role to the Redshift cluster
* Setting user permissions in Redshift

### Create a new IAM Policy

* Sign in to your AWS Management Console and open the [IAM console](https://console.aws.amazon.com/iam/).
* Go to Policies > Create policy. 
* Choose the JSON option. Then, paste the JSON below depending on whether you use Redshift Cluster or Redshift Serverless
* Assign a unique and descriptive name to the policy, provide a clear description, and then select Create Policy.

<Tabs>
  <TabItem value="cluster" label="Redshift Cluster" default>

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "redshift:GetClusterCredentials",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:redshift:${Region}:${AccountId}:dbuser:${ClusterIdentifier}/${Username}",
        "arn:aws:redshift:${Region}:${AccountId}:dbname:${ClusterIdentifier}/${Database}"
      ]
    },
    {
      "Action": [
        "redshift-data:BatchExecuteStatement",
        "redshift-data:ExecuteStatement"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:redshift:${Region}:${AccountId}:cluster:${ClusterIdentifier}"
      ]
    },
    {
      "Action": [
        "redshift-data:GetStatementResult",
        "redshift-data:CancelStatement",
        "redshift-data:DescribeStatement"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${S3BucketName}",
        "arn:aws:s3:::${S3BucketName}/*"
      ]
    }
  ]
}
```

:::tip

Make Sure to replace `${...}` macros with appropriate values from the Configuration section,<br/>`${AccountId}` with your AWS Account ID

:::

  </TabItem>
  <TabItem value="serverless" label="Redshift Serverless">

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "redshift-serverless:GetCredentials",
      "Effect": "Allow",
      "Resource": [
        "${WorkgroupARN}"
      ]
    },
    {
      "Action": [
        "redshift-data:BatchExecuteStatement",
        "redshift-data:ExecuteStatement"
      ],
      "Effect": "Allow",
      "Resource": [
        "${WorkgroupARN}"
      ]
    },
    {
      "Action": [
        "redshift-data:GetStatementResult",
        "redshift-data:CancelStatement",
        "redshift-data:DescribeStatement"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${S3BucketName}",
        "arn:aws:s3:::${S3BucketName}/*"
      ]
    }
  ]
}
```

:::tip

Make Sure to replace `${S3BucketName}` macros with appropriate value from the Configuration section,<br/>`${WorkgroupARN}` with the Workgroup ARN value from your Redshift Serverless workgroup page

:::

  </TabItem>
</Tabs>

### Create a new IAM Role

* Sign in to your AWS Management Console and open the [IAM console](https://console.aws.amazon.com/iam/).
* Go to Roles > Create role.
* Under **Trusted entity type**, select **Custom trust policy**.
* Paste the JSON below into the **Custom trust policy** field and replace `${WorkspaceId}` macro with your Jitsu **Workspace ID** (Jitsu UI -> Settings -> Workspace Settings).
* In the policy selection screen, find and check the policy created in the [Create policy](#create-a-new-iam-policy) section.
* Assign a unique and descriptive name to the role, provide a clear description, and then select Create role.
* Find the newly created role in the list and click on it.
* Copy the **ARN** value from the **Summary** section and use in Jitsu Redshift Configuration.

**Custom trust policy:**
```json
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::907458119157:root"
			},
			"Action": "sts:AssumeRole",
			"Condition": {
				"StringEquals": {
					"sts:ExternalId": "${WorkspaceId}"
				}
			}
		},
		{
			"Effect": "Allow",
			"Principal": {
				"Service": [
					"redshift-serverless.amazonaws.com",
					"redshift.amazonaws.com"
				]
			},
			"Action": "sts:AssumeRole"
		}
	]
}
```

:::info
`907458119157` - Jitsu AWS Account Id
:::

### Attaching the new IAM role to the Redshift cluster

Redshift will use the same role to Copy data from S3 Bucket. 

#### To attach the role to the Redshift Cluster:

* Open [Provisioned clusters dashboard](https://console.aws.amazon.com/redshiftv2/home)
* Select your cluster
* In the **Actions** dropdown, select **Manage IAM roles**
* In the **Available IAM roles**, select the role you [created](#create-a-new-iam-role) and click **Associate IAM role**
* Click **Save changes**

#### To attach the role to the Redshift Serverless:

* Open [Serverless dashboard](https://console.aws.amazon.com/redshiftv2/home)
* In the **Namespaces / Workgroups** section, select your namespace
* Open the **Security and encryption** tab and click **Manage IAM roles**
* Click the **Associate IAM roles** button and select the role you [created](#create-a-new-iam-role)
* Click  the **Associate IAM roles** button
* Click **Save changes**

### Setting user permissions in Redshift

* Open [Redshift Query Editor](https://console.aws.amazon.com/sqlworkbench/home)
* Run the following SQL queries to create a new user and grant it  the necessary permissions :

<Tabs>
  <TabItem value="cluster" label="Redshift Cluster" default>

```sql
-- create a new user
CREATE USER ${Username} PASSWORD disable;

-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO ${Username};

-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to ${Username};
```

  </TabItem>
  <TabItem value="serverless" label="Redshift Serverless" default>

```sql
-- create new user with the Redshift IAM role name
CREATE USER "IAMR:${RoleName}" PASSWORD disable; 

-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO "IAMR:${RoleName}";

-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to "IAMR:${RoleName}";
```

  </TabItem>
</Tabs>

:::tip

Make Sure to replace `${...}` macros with appropriate values from the Configuration section,<br/>`${RoleName}` with your the name of role you [created](#create-a-new-iam-role)

:::


## Advanced: Implementation Details

### Batch Mode

S3 is used as a intermediate storage for batched events.

<details>
<summary>Algorithm</summary>

```sql
-- Write to tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
INSERT into target_table select from tmp_table
COMMIT -- commit transaction

```

</details>

### Stream Mode

:::caution[Performance considerations]
Supported as plain insert statements. Don't use at production scale (more than 10 records per minute)
:::

### Deduplication

For batch mode the following algorithm is used:

<details>
<summary>Algorithm</summary>

```sql
-- Write to tmp file
-- Deduplicate rows in tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
DELETE from target_table T using tmp_table TMP where T.pk_field = TMP.pk_field
INSERT into target_table select from tmp_table
COMMIT -- commit transaction

```

</details>

For stream mode (⚠️Don't use at production scale (more than 10 records per minute)):

- SELECT by primary key
- Either INSERT or UPDATE depending on result

### Queries Optimization

**Timestamp** connection setting is used to optimize SELECT queries.

Selected timestamp column will be used as [sort key](https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html) for target table.