Redshift
Amazon Redshift is a cloud data warehouse that is optimized for the analytical workloads of business intelligence (BI) and data warehousing (DWH). Jitsu supports both Serverless and Classic Redshift
Features
| Feature | Supported |
|---|---|
| Batch Mode | ✅ |
| Stream Mode | ⚠️ slow |
| Deduplication | ✅ |
| Queries Optimization | ✅ |
Configuration
Jitsu supports both database username/password based authentication and IAM Role based authentication for Redshift data warehouse.
General parameters
| Parameter name | Description |
|---|---|
| Authentication Method | password - Username/Password based authentication, iam - IAM Role based authentication |
| Region | AWS Region of Redshift Cluster and S3 bucket |
| Database | Redshift database name |
| Schema | Redshift schema |
| S3 Bucket Name | S3 Bucket Name |
Configuration settings depend on the selected authentication method.
Username/Password based authentication
| Parameter name | Description |
|---|---|
| Host | Redshift public endpoint |
| Username | Redshift username |
| Password | Redshift password |
| S3 Access Key Id | S3 Access Key Id |
| S3 Secret Access Key | S3 Secret Access Key |
IAM Role based authentication
| Parameter name | Description |
|---|---|
| Redshift Serverless | true if connecting to Redshift Serverless instance |
| Cluster Identifier | Redshift cluster identifier (Redshift Cluster only) |
| Workgroup name | Redshift workgroup name (Redshift Serverless only) |
| Role ARN | IAM role ARN |
| Username | Redshift username (Redshift Cluster only) |
To setup IAM Role based authentication for Redshift, follow the Advanced: IAM Role for Jitsu section.
Advanced: IAM Role for Jitsu
To allow Jitsu to connect to Redshift using IAM Role, the following steps should be performed in AWS Console:
- Create a new IAM Policy
- Create a new IAM Role
- Attach the new IAM role to the Redshift cluster
- Setting user permissions in Redshift
Create a new IAM Policy
- Sign in to your AWS Management Console and open the IAM console.
- Go to Policies > Create policy.
- Choose the JSON option. Then, paste the JSON below depending on whether you use Redshift Cluster or Redshift Serverless
- Assign a unique and descriptive name to the policy, provide a clear description, and then select Create Policy.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "redshift:GetClusterCredentials",
"Effect": "Allow",
"Resource": [
"arn:aws:redshift:${Region}:${AccountId}:dbuser:${ClusterIdentifier}/${Username}",
"arn:aws:redshift:${Region}:${AccountId}:dbname:${ClusterIdentifier}/${Database}"
]
},
{
"Action": [
"redshift-data:BatchExecuteStatement",
"redshift-data:ExecuteStatement"
],
"Effect": "Allow",
"Resource": [
"arn:aws:redshift:${Region}:${AccountId}:cluster:${ClusterIdentifier}"
]
},
{
"Action": [
"redshift-data:GetStatementResult",
"redshift-data:CancelStatement",
"redshift-data:DescribeStatement"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::${S3BucketName}",
"arn:aws:s3:::${S3BucketName}/*"
]
}
]
}Make Sure to replace ${...} macros with appropriate values from the Configuration section,${AccountId} with your AWS Account ID
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "redshift-serverless:GetCredentials",
"Effect": "Allow",
"Resource": [
"${WorkgroupARN}"
]
},
{
"Action": [
"redshift-data:BatchExecuteStatement",
"redshift-data:ExecuteStatement"
],
"Effect": "Allow",
"Resource": [
"${WorkgroupARN}"
]
},
{
"Action": [
"redshift-data:GetStatementResult",
"redshift-data:CancelStatement",
"redshift-data:DescribeStatement"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::${S3BucketName}",
"arn:aws:s3:::${S3BucketName}/*"
]
}
]
}Make Sure to replace ${S3BucketName} macros with appropriate value from the Configuration section,${WorkgroupARN} with the Workgroup ARN value from your Redshift Serverless workgroup page
Create a new IAM Role
- Sign in to your AWS Management Console and open the IAM console.
- Go to Roles > Create role.
- Under Trusted entity type, select Custom trust policy.
- Paste the JSON below into the Custom trust policy field and replace
${WorkspaceId}macro with your Jitsu Workspace ID (Jitsu UI -> Settings -> Workspace Settings). - In the policy selection screen, find and check the policy created in the Create policy section.
- Assign a unique and descriptive name to the role, provide a clear description, and then select Create role.
- Find the newly created role in the list and click on it.
- Copy the ARN value from the Summary section and use in Jitsu Redshift Configuration.
Custom trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::907458119157:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "${WorkspaceId}"
}
}
},
{
"Effect": "Allow",
"Principal": {
"Service": [
"redshift-serverless.amazonaws.com",
"redshift.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}907458119157 - Jitsu AWS Account Id
Attaching the new IAM role to the Redshift cluster
Redshift will use the same role to Copy data from S3 Bucket.
To attach the role to the Redshift Cluster:
- Open Provisioned clusters dashboard
- Select your cluster
- In the Actions dropdown, select Manage IAM roles
- In the Available IAM roles, select the role you created and click Associate IAM role
- Click Save changes
To attach the role to the Redshift Serverless:
- Open Serverless dashboard
- In the Namespaces / Workgroups section, select your namespace
- Open the Security and encryption tab and click Manage IAM roles
- Click the Associate IAM roles button and select the role you created
- Click the Associate IAM roles button
- Click Save changes
Setting user permissions in Redshift
- Open Redshift Query Editor
- Run the following SQL queries to create a new user and grant it the necessary permissions :
-- create a new user
CREATE USER ${Username} PASSWORD disable;
-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO ${Username};
-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to ${Username};-- create new user with the Redshift IAM role name
CREATE USER "IAMR:${RoleName}" PASSWORD disable;
-- grant user schema creation permissions on the database
GRANT CREATE ON DATABASE ${Database} TO "IAMR:${RoleName}";
-- if you want to grant permissions to the existing schema
GRANT ALL ON SCHEMA ${Database}.${Schema} to "IAMR:${RoleName}";Make Sure to replace ${...} macros with appropriate values from the Configuration section,${RoleName} with your the name of role you created
Advanced: Implementation Details
Batch Mode
S3 is used as a intermediate storage for batched events.
Algorithm
-- Write to tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
INSERT into target_table select from tmp_table
COMMIT -- commit transaction
Stream Mode
Supported as plain insert statements. Don't use at production scale (more than 10 records per minute)
Deduplication
For batch mode the following algorithm is used:
Algorithm
-- Write to tmp file
-- Deduplicate rows in tmp file
-- Load tmp file to s3
BEGIN -- start transaction
COPY from s3 to tmp_table
DELETE from target_table T using tmp_table TMP where T.pk_field = TMP.pk_field
INSERT into target_table select from tmp_table
COMMIT -- commit transaction
For stream mode (⚠️Don't use at production scale (more than 10 records per minute)):
- SELECT by primary key
- Either INSERT or UPDATE depending on result
Queries Optimization
Timestamp connection setting is used to optimize SELECT queries.
Selected timestamp column will be used as sort key for target table.