Redshift

Amazon Redshift is a cloud data warehouse that is optimized for the analytical workloads of business intelligence (BI) and data warehousing (DWH). Jitsu supports both Serverless and Classic Redshift

Features

FeatureSupported
Batch Mode
Stream Mode⚠️ slow
Deduplication
Queries Optimization

Configuration

Jitsu supports both database username/password based authentication and IAM Role based authentication for Redshift data warehouse.

General parameters

Parameter nameDescription
Authentication Methodpassword - Username/Password based authentication, iam - IAM Role based authentication
RegionAWS Region of Redshift Cluster and S3 bucket
DatabaseRedshift database name
SchemaRedshift schema
S3 Bucket NameS3 Bucket Name

Configuration settings depend on the selected authentication method.

Username/Password based authentication

Parameter nameDescription
HostRedshift public endpoint
UsernameRedshift username
PasswordRedshift password
S3 Access Key IdS3 Access Key Id
S3 Secret Access KeyS3 Secret Access Key

IAM Role based authentication

Parameter nameDescription
Redshift Serverlesstrue if connecting to Redshift Serverless instance
Cluster IdentifierRedshift cluster identifier (Redshift Cluster only)
Workgroup nameRedshift workgroup name (Redshift Serverless only)
Role ARNIAM role ARN
UsernameRedshift username (Redshift Cluster only)

To setup IAM Role based authentication for Redshift, follow the Advanced: IAM Role for Jitsu section.

Advanced: IAM Role for Jitsu

To allow Jitsu to connect to Redshift using IAM Role, the following steps should be performed in AWS Console:

  • Create a new IAM Policy
  • Create a new IAM Role
  • Attach the new IAM role to the Redshift cluster
  • Setting user permissions in Redshift

Create a new IAM Policy

  • Sign in to your AWS Management Console and open the IAM console.
  • Go to Policies > Create policy.
  • Choose the JSON option. Then, paste the JSON below depending on whether you use Redshift Cluster or Redshift Serverless
  • Assign a unique and descriptive name to the policy, provide a clear description, and then select Create Policy.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "redshift:GetClusterCredentials",
      "Effect": "Allow",
      "Resource": [
        "arn:aws:redshift:${Region}:${AccountId}:dbuser:${ClusterIdentifier}/${Username}",
        "arn:aws:redshift:${Region}:${AccountId}:dbname:${ClusterIdentifier}/${Database}"
      ]
    },
    {
      "Action": [
        "redshift-data:BatchExecuteStatement",
        "redshift-data:ExecuteStatement"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:redshift:${Region}:${AccountId}:cluster:${ClusterIdentifier}"
      ]
    },
    {
      "Action": [
        "redshift-data:GetStatementResult",
        "redshift-data:CancelStatement",
        "redshift-data:DescribeStatement"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${S3BucketName}",
        "arn:aws:s3:::${S3BucketName}/*"
      ]
    }
  ]
}
Tip

Make Sure to replace ${...} macros with appropriate values from the Configuration section,
${AccountId} with your AWS Account ID

Create a new IAM Role

  • Sign in to your AWS Management Console and open the IAM console.
  • Go to Roles > Create role.
  • Under Trusted entity type, select Custom trust policy.
  • Paste the JSON below into the Custom trust policy field and replace ${WorkspaceId} macro with your Jitsu Workspace ID (Jitsu UI -> Settings -> Workspace Settings).
  • In the policy selection screen, find and check the policy created in the Create policy section.
  • Assign a unique and descriptive name to the role, provide a clear description, and then select Create role.
  • Find the newly created role in the list and click on it.
  • Copy the ARN value from the Summary section and use in Jitsu Redshift Configuration.

Custom trust policy:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::907458119157:root"
			},
			"Action": "sts:AssumeRole",
			"Condition": {
				"StringEquals": {
					"sts:ExternalId": "${WorkspaceId}"
				}
			}
		},
		{
			"Effect": "Allow",
			"Principal": {
				"Service": [
					"redshift-serverless.amazonaws.com",
					"redshift.amazonaws.com"
				]
			},
			"Action": "sts:AssumeRole"
		}
	]
}
Info

907458119157 - Jitsu AWS Account Id

Attaching the new IAM role to the Redshift cluster

Redshift will use the same role to Copy data from S3 Bucket.

To attach the role to the Redshift Cluster:

  • Open Provisioned clusters dashboard
  • Select your cluster
  • In the Actions dropdown, select Manage IAM roles
  • In the Available IAM roles, select the role you created and click Associate IAM role
  • Click Save changes

To attach the role to the Redshift Serverless:

  • Open Serverless dashboard
  • In the Namespaces / Workgroups section, select your namespace
  • Open the Security and encryption tab and click Manage IAM roles
  • Click the Associate IAM roles button and select the role you created
  • Click the Associate IAM roles button
  • Click Save changes

Setting user permissions in Redshift

  • Open Redshift Query Editor
  • Run the following SQL queries to create a new user and grant it the necessary permissions :
-- create a new user
CREATE USER ${Username} PASSWORD disable;
 
-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO ${Username};
 
-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to ${Username};
Tip

Make Sure to replace ${...} macros with appropriate values from the Configuration section,
${RoleName} with your the name of role you created

Advanced: Implementation Details

Batch Mode

S3 is used as a intermediate storage for batched events.

Algorithm
-- Write to tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
INSERT into target_table select from tmp_table
COMMIT -- commit transaction
 

Stream Mode

Performance considerations

Supported as plain insert statements. Don't use at production scale (more than 10 records per minute)

Deduplication

For batch mode the following algorithm is used:

Algorithm
-- Write to tmp file
-- Deduplicate rows in tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
DELETE from target_table T using tmp_table TMP where T.pk_field = TMP.pk_field
INSERT into target_table select from tmp_table
COMMIT -- commit transaction
 

For stream mode (⚠️Don't use at production scale (more than 10 records per minute)):

  • SELECT by primary key
  • Either INSERT or UPDATE depending on result

Queries Optimization

Timestamp connection setting is used to optimize SELECT queries.

Selected timestamp column will be used as sort key for target table.