Before reading this guide...
If you'd like to read this guide, as well as additional Amazon Redshift security guides (About topics like data masking, auditing & monitoring, creating a data inventory, and more), visit our full guide, where you can also download the guide as a PDF e-book.Though I try to keep this guide up to date, I've also written several more specific Amazon Redshift security and governance guides: Also, if interested, I also wrote a Snowflake security guide, as well as a BigQuery security guide.
Security in Amazon RedshiftRedshift is the world’s most popular cloud data warehouse solution and is currently utilized by tens of thousands of organizations worldwide. It’s based on modified PostgreSQL and provided by AWS. Its access control can be broken down into the following three areas:
- Cluster management, meaning the ability to create, configure and delete the infrastructure itself (i.e. Redshift clusters). These operations are controlled by AWS security credentials and can be carried out by IAM users from the console or by API.
- Cluster connectivity, meaning the network access control. This is based on CIDR (Classless Inter-Domain Routing) security groups.
- Database access, meted out per securable object (database, table, column or view) and configured by SQL GRANT & CREATE commands. Temporary access is also available via specific connection strings using AWS IAM users. If you already manage user identities outside of AWS, you may use IAM identity providers instead of creating IAM users in your AWS account.
- Network access control in AWS Redshift
- Managing Redshift access across users, roles and groups
- Fine-grained Redshift access control
- Access logging & monitoring in AWS Redshift
- Privilege violation logging & monitoring in AWS Redshift
- Identity management integration for federation
Network access control in AWS RedshiftNetwork access control in AWS Redshift is managed by the network infrastructure configuration in your AWS account. This is where you can set cluster connectivity, limitations within your VPC or whether it opens publicly or through a VPN. Redshift’s network configuration settings are more or less identical to access configuration of other AWS resources and will not be covered in this specific guide. Note that, when setting network access on Redshift, you’re setting your connectivity to the entire cluster. This will limit your options for more granular control, preventing you from allowing access to specific securable objects (databases, tables, views, columns or rows) or only granting certain IP addresses access to specific sensitive data.
Managing Redshift access across users, roles and groupsYou can either manage your users and groups within Redshift, or use AWS IAM users assigned via the connection string. The privileges to access specific objects (i.e. access to tables) are tightly coupled with the DB engine itself, and are configured via Redshift SQL commands. Groups can be viewed as roles and can therefore be assigned to users. A user inherits the privileges from the group (or groups) they belong to.This means that, if we have a sales group and a marketing group, we can assign salespeople to the sales group and marketing employees to the marketing group. We can then assign selected “power users” with access to both groups.
An example of setting group privileges in Redshift:
/* Creating the 2 groups */ CREATE GROUP sales; CREATE GROUP marketing; /* Creating the 2 schemas with the data */ CREATE SCHEMA sales; CREATE SCHEMA marketing; /* Give sales USAGE rights in schema, and read-only (SELECT) access to the tables within the schema */ GRANT USAGE on SCHEMA sales to GROUP sales; GRANT SELECT ON ALL TABLES IN SCHEMA sales TO GROUP sales; /* Give sales USAGE rights in schema, and read-only (SELECT) access to the tables within the schema */ GRANT USAGE on SCHEMA marketing to GROUP marketing; GRANT SELECT ON ALL TABLES IN SCHEMA marketing TO GROUP marketing;
Using groups as roles in Redshift:
CREATE USER sales_person1 password 'xyzzy-1-XYZZY'; ALTER GROUP sales ADD USER sales_person1; CREATE USER marketing_person1 password 'xyzzy-1-XYZZY'; ALTER GROUP marketing ADD USER marketing_person1; CREATE USER marketing_and_sales password 'xyzzy-1-XYZZY'; ALTER GROUP marketing ADD USER marketing_and_sales; ALTER GROUP sales ADD USER marketing_and_sales;
SELECT usename AS user_name, groname AS group_name FROM pg_user, pg_group WHERE pg_user.usesysid = ANY(pg_group.grolist) AND pg_group.groname in (SELECT DISTINCT pg_group.groname from pg_group);
Fine-grained Redshift access controlAWS Redshift offers fine-grained access control by allowing configuration of access controls to databases, tables and views, as well as to specific columns in tables. This can be set in the SELECT level (reading data), UPDATE level (updating data) or ALL (both). Let’s take a look at this feature in action. Let’s say that ACME international has a table called Orders, containing all of the details recorded for each order. The company has a shipping team, which currently has read-only access on this table (due to the decision that allowing them to update data could pose risks to ACME). Here’s how we set up this example in Redshift:
/* Creating a very naive orders table... */ CREATE TABLE demo.public.orders ( order_id varchar(255), order_checksum int, shipping_firstname varchar(50), shipping_middlename varchar(25), shipping_lastname varchar(50), shipping_street1 varchar(255), shipping_street2 varchar(255), shipping_street3 varchar(255), shipping_zipcode varchar(15), shipping_pob varchar(15), shipping_city varchar(50), shipping_phone1 varchar(50), shipping_phone2 varchar(50), shipping_cellular varchar(50), shipping_hours varchar(50), shipping_comments varchar(255), payer_creditcard varchar(19), payer_expmonth varchar(2), payer_expyear varchar(4), payer_firstname varchar(50), payer_middlename varchar(25), payer_lastname varchar(50), payer_street1 varchar(255), payer_street2 varchar(255), payer_street3 varchar(255), payer_zipcode varchar(15), payer_pob varchar(15), payer_city varchar(50), payer_phone1 varchar(50), payer_phone2 varchar(50), payer_cellular varchar(50), payer_hours varchar(50), payer_comments varchar(255)); /* Creating the shipping user */ CREATE USER shipping PASSWORD 'xyzzy-1-XYZZY'; /* Giving the user read-only access */ GRANT SELECT ON demo.public.orders TO shipping; /* And now with the shipping user we can do: */ SELECT * FROM demo.public.orders;
How to set column level security in Redshift:
/* Revoking the existing SELECT privilege on the entire table */ REVOKE SELECT ON demo.public.orders FROM shipping; /* Granting SELECT privilege specifically to all columns except for the forbidden ones */ GRANT SELECT(order_id, order_checksum, shipping_firstname, shipping_middlename, shipping_lastname, shipping_street1, shipping_street2, shipping_street3, shipping_zipcode, shipping_pob, shipping_city, shipping_phone1, shipping_phone2, shipping_cellular, shipping_hours, shipping_comments, payer_firstname, payer_middlename, payer_lastname, payer_street1, payer_street2, payer_street3, payer_zipcode, payer_pob, payer_city, payer_phone1, payer_phone2, payer_cellular, payer_hours, payer_comments) ON demo.public.orders TO shipping
Row level security in RedshiftRow level security means that, in a certain table, certain users can only access specific rows. These rows should have criteria (usually based on the value of one of the columns) that define which role can access the specific item (row). In certain cases, this additionally requires a translation or lookup table to define relations between roles and the type of items they can access within the specific table. Setting up row level access control in Redshift is rather complicated, but not impossible. Let’s take a look at this proof of concept: In this example, we have a table called department_employees, which contains details about the employees in each department. Let’s create the table and populate it with employees from our sales & marketing teams:
CREATE TABLE department_employees ( id int, name varchar(50), phone varchar(50), salary smallint, department varchar(50)); INSERT INTO department_employees VALUES (1, 'Seller McSeller', '+1-212-5555555', 180, 'sales'), (2, 'Sir Sell-A-Lot', '+1-212-5556666', 240, 'sales'), (3, 'Marky McMarket', '+1-716-5555555', 210, 'marketing'), (4, 'Sir Market-A-Lot', '+1-716-5556666', 270, 'marketing');
CREATE TABLE users_to_groups (user_name varchar(100), group_name varchar(100)); INSERT INTO users_to_groups VALUES ('marketing_accountant', 'marketing'); /* Let's also create an accountant user */ CREATE USER marketing_accountant WITH PASSWORD 'xyzzy-1-XYZZY';
CREATE VIEW v_department_employees AS SELECT * FROM department_employees WHERE department IN (SELECT user_name, group_name FROM users_to_groups WHERE user_name=CURRENT_USER); /* Granting access to the user in views */ GRANT SELECT ON users_to_groups TO marketing_accountant; GRANT SELECT ON v_department_employees TO marketing_accountant; /* Switching to use the context of the user 'marketing_accountant' */ SET SESSION AUTHORIZATION marketing_accountant; SELECT * FROM department_employees; /* We get a permission denied error, as we don't have access to the table itself: Invalid operation: permission denied for relation department_employees */ SELECT * FROM v_department_employees; /* We now get the filtered rows */
Access logging & monitoring in RedshiftLogging failed and successful access attempts to Redshift data warehouses can be achieved by either using the system table STL_CONNECTION_LOG or by enabling audit logs (which are kept in S3 buckets). Note that the audit logs are not enabled by default, meaning that you will need to manually enable them.
How to set up audit logs for AWS redshiftThis explanation will focus on setting up audit logs from the AWS console. You can also do this by using AWS API or CLI. Note that keeping audit logs, especially activity monitoring for data warehouses with a lot of activity, may incur high costs. It is important to take this into account prior to enabling audit logs. To enable audit logs, edit the Audit logging section under ‘Maintenance and monitoring’ in the cluster configuration: Inside, configure the S3 bucket definitions. For example: If you would also like to log user activity (queries running against the data warehouse), you must enable activity monitoring, too. This will add a significant amount of logs to your logging S3 bucket.
Enabling activity monitoring in Redshift:Step 1: create a new parameter group in your Redshift cluster. Once the new parameter group is created, edit the parameters and set enable_user_activity_logging to true: Within the database configuration, set the database configuration to the new parameter group you created. For example:
How long do AWS Redshift keep access logs?When using the system tables, consider that AWS claims that retention time is only two to five days (though in a test cluster, we’ve seen log records of about seven days). Such a short retention span is useful for troubleshooting, but not for tracking behavior over a long period of time or performing incident response for incidents that elapsed outside of the retained time frame. On the other hand, when enabling audit logs, the logs are kept indefinitely, unless you set a S3 lifecycle setting on that bucket or perform an ETL operation to purge and compact your data. Purging the data may be necessary if you only want to access data for specific use-cases. Compacting it means removing access logs which you will probably not need, such as connections by the rdsdb user used by AWS for maintenance. It is also important to note that, since system tables are kept on the cluster nodes and audit logs are saved separately from the Redshift cluster itself (in S3 buckets), access to the audit logs can be achieved separately from the Redshift access (i.e. allowing the security team access to the S3 buckets without having to give them access to the Redshift data warehouse itself). Here are some fun examples of querying access logs for useful information:
How to look specifically for failed logins?
SELECT * FROM stl_connection_log WHERE event='authentication failure' ORDER BY recordtime;
Showing successfully authenticated users with the number of successful authentications:
SELECT username, event, COUNT(*) FROM stl_connection_log WHERE event = 'authenticated' GROUP BY 1, 2 ORDER BY 3 DESC;
Showing successfully authenticated users by hourly buckets:
SELECT DATE_PART(YEAR, recordtime) || '-' || LPAD(DATE_PART(MONTH, recordtime),2,'0') || '-' || LPAD(DATE_PART(DAY, recordtime),2,'0') || ' ' || LPAD(DATE_PART(HOUR, recordtime),2,'0') AS hour_bucket, username, COUNT(*) FROM stl_connection_log WHERE event = 'authenticated' GROUP BY 1, 2 ORDER BY 1, 2 DESC; /* Showing an hourly breakdown of users authenticating, excluding rdsdb */ SELECT DATE_PART(YEAR, recordtime) || '-' || LPAD(DATE_PART(MONTH, recordtime),2,'0') || '-' || LPAD(DATE_PART(DAY, recordtime),2,'0') || ' ' || LPAD(DATE_PART(HOUR, recordtime),2,'0') AS hour_bucket, username, COUNT(*) FROM stl_connection_log WHERE event = 'authenticated' AND username != 'rdsdb' GROUP BY 1, 2 ORDER BY 1, 2 DESC;
Showing a list of the connection drivers used by the redshift users:
SELECT username, application_name, COUNT(*) FROM stl_connection_log WHERE application_name != '' GROUP BY 1,2 ORDER BY 1,2;
Privilege violation logging & monitoring in AWS RedshiftPrior to setting such access controls, you will be able to see queries pulling data from these resources by querying STL_QUERY, as seen below: Retrieving queries access to specific objects in Redshift:
SELECT * FROM STL_QUERY WHERE userid!=1 AND querytxt LIKE '%customers%' ORDER BY query DESC LIMIT 100;