Redshift

Amazon Redshift is a cloud data warehouse that is optimized for the analytical workloads of business intelligence (BI) and data warehousing (DWH). Jitsu supports both Serverless and Classic Redshift

Features

Feature	Supported
Batch Mode	✅
Stream Mode	⚠️ ^slow
Deduplication	✅
Queries Optimization	✅

Configuration

Jitsu supports both database username/password based authentication and IAM Role based authentication for Redshift data warehouse.

General parameters

Parameter name	Description
Authentication Method	`password` - Username/Password based authentication, `iam` - IAM Role based authentication
Region	AWS Region of Redshift Cluster and S3 bucket
Database	Redshift database name
Schema	Redshift schema
S3 Bucket Name	S3 Bucket Name

Configuration settings depend on the selected authentication method.

Username/Password based authentication

Parameter name	Description
Host	Redshift public endpoint
Username	Redshift username
Password	Redshift password
S3 Access Key Id	S3 Access Key Id
S3 Secret Access Key	S3 Secret Access Key

IAM Role based authentication

Parameter name	Description
Redshift Serverless	`true` if connecting to Redshift Serverless instance
Cluster Identifier	Redshift cluster identifier (Redshift Cluster only)
Workgroup name	Redshift workgroup name (Redshift Serverless only)
Role ARN	IAM role ARN
Username	Redshift username (Redshift Cluster only)

To setup IAM Role based authentication for Redshift, follow the Advanced: IAM Role for Jitsu section.

Advanced: IAM Role for Jitsu

To allow Jitsu to connect to Redshift using IAM Role, the following steps should be performed in AWS Console:

Create a new IAM Policy
Create a new IAM Role
Attach the new IAM role to the Redshift cluster
Setting user permissions in Redshift

Create a new IAM Policy

Sign in to your AWS Management Console and open the IAM console.
Go to Policies > Create policy.
Choose the JSON option. Then, paste the JSON below depending on whether you use Redshift Cluster or Redshift Serverless
Assign a unique and descriptive name to the policy, provide a clear description, and then select Create Policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "redshift:GetClusterCredentials",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:redshift:${Region}:${AccountId}:dbuser:${ClusterIdentifier}/${Username}",
        "arn:aws:redshift:${Region}:${AccountId}:dbname:${ClusterIdentifier}/${Database}"
      ]
    },
    {
      "Action": [
        "redshift-data:BatchExecuteStatement",
        "redshift-data:ExecuteStatement"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:redshift:${Region}:${AccountId}:cluster:${ClusterIdentifier}"
      ]
    },
    {
      "Action": [
        "redshift-data:GetStatementResult",
        "redshift-data:CancelStatement",
        "redshift-data:DescribeStatement"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${S3BucketName}",
        "arn:aws:s3:::${S3BucketName}/*"
      ]
    }
  ]
}

Tip

Make Sure to replace ${...} macros with appropriate values from the Configuration section,
${AccountId} with your AWS Account ID

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "redshift-serverless:GetCredentials",
      "Effect": "Allow",
      "Resource": [
        "${WorkgroupARN}"
      ]
    },
    {
      "Action": [
        "redshift-data:BatchExecuteStatement",
        "redshift-data:ExecuteStatement"
      ],
      "Effect": "Allow",
      "Resource": [
        "${WorkgroupARN}"
      ]
    },
    {
      "Action": [
        "redshift-data:GetStatementResult",
        "redshift-data:CancelStatement",
        "redshift-data:DescribeStatement"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${S3BucketName}",
        "arn:aws:s3:::${S3BucketName}/*"
      ]
    }
  ]
}

Tip

Make Sure to replace ${S3BucketName} macros with appropriate value from the Configuration section,
${WorkgroupARN} with the Workgroup ARN value from your Redshift Serverless workgroup page

Create a new IAM Role

Sign in to your AWS Management Console and open the IAM console.
Go to Roles > Create role.
Under Trusted entity type, select Custom trust policy.
Paste the JSON below into the Custom trust policy field and replace ${WorkspaceId} macro with your Jitsu Workspace ID (Jitsu UI -> Settings -> Workspace Settings).
In the policy selection screen, find and check the policy created in the Create policy section.
Assign a unique and descriptive name to the role, provide a clear description, and then select Create role.
Find the newly created role in the list and click on it.
Copy the ARN value from the Summary section and use in Jitsu Redshift Configuration.

Custom trust policy:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::907458119157:root"
			},
			"Action": "sts:AssumeRole",
			"Condition": {
				"StringEquals": {
					"sts:ExternalId": "${WorkspaceId}"
				}
			}
		},
		{
			"Effect": "Allow",
			"Principal": {
				"Service": [
					"redshift-serverless.amazonaws.com",
					"redshift.amazonaws.com"
				]
			},
			"Action": "sts:AssumeRole"
		}
	]
}

Info

907458119157 - Jitsu AWS Account Id

Attaching the new IAM role to the Redshift cluster

Redshift will use the same role to Copy data from S3 Bucket.

To attach the role to the Redshift Cluster:

Open Provisioned clusters dashboard
Select your cluster
In the Actions dropdown, select Manage IAM roles
In the Available IAM roles, select the role you created and click Associate IAM role
Click Save changes

To attach the role to the Redshift Serverless:

Open Serverless dashboard
In the Namespaces / Workgroups section, select your namespace
Open the Security and encryption tab and click Manage IAM roles
Click the Associate IAM roles button and select the role you created
Click the Associate IAM roles button
Click Save changes

Setting user permissions in Redshift

Open Redshift Query Editor
Run the following SQL queries to create a new user and grant it the necessary permissions :

-- create a new user
CREATE USER ${Username} PASSWORD disable;
 
-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO ${Username};
 
-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to ${Username};

-- create new user with the Redshift IAM role name
CREATE USER "IAMR:${RoleName}" PASSWORD disable; 
 
-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO "IAMR:${RoleName}";
 
-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to "IAMR:${RoleName}";

Tip

Make Sure to replace ${...} macros with appropriate values from the Configuration section,
${RoleName} with your the name of role you created

Advanced: Implementation Details

Batch Mode

S3 is used as a intermediate storage for batched events.

Algorithm

-- Write to tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
INSERT into target_table select from tmp_table
COMMIT -- commit transaction

Stream Mode

Performance considerations

Supported as plain insert statements. Don't use at production scale (more than 10 records per minute)

Deduplication

For batch mode the following algorithm is used:

Algorithm

-- Write to tmp file
-- Deduplicate rows in tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
DELETE from target_table T using tmp_table TMP where T.pk_field = TMP.pk_field
INSERT into target_table select from tmp_table
COMMIT -- commit transaction

For stream mode (⚠️Don't use at production scale (more than 10 records per minute)):

SELECT by primary key
Either INSERT or UPDATE depending on result

Queries Optimization

Timestamp connection setting is used to optimize SELECT queries.

Selected timestamp column will be used as sort key for target table.