Amazon Redshift Auditing & Monitoring

2022-12-01

If you'd like to view the entire guide of AWS Redshift Security topics, visit our Amazon Redshift Security guide. In there, you can also download the entire guide as a PDF e-book.

An audit log consists of logging certain operations to a log. The audit log is then both kept for investigations into events, as well as analyzed continuously to find incidents that you want to know about to reduce the risk of data loss (such as compliance breaks, over-privileged users, and other security risks). The more relevant information and context you have about the events logged - the better. In addition to their security and operational usefulness, audit logs are also an important part of meeting compliance requirements.

 

Redshift Auditing and Monitoring

 

 

Database audit logs are separated into two parts:

  1. An access log, detailing the history of successful and failed logins to the database.
  2. A query log, detailing the history of successful and failed queries made on the database.

As a certain database session contains one successful login event, but may contain a large number of queries sent to the database, and the amount of information may also be substantially larger (a query can be anything from a SELECT 1 heartbeat to a hundred lines of code).

 

Why are audit logs especially important in Amazon Redshift?

Amazon Redshift is used as a data warehouse or as part of a data lake solution with Redshift Spectrum. With the growing popularity of data democratization, and data analytics in general, organizations allow data access to more users and teams. This means that those organizations also want to understand what’s happening in their data analysis environments, both in terms of security and in terms of operational efficiency.

 

Where are Amazon Redshift logs kept?

There are two levels of auditing in Redshift. The logs are natively kept in system tables, and in addition, for long-term storage, you can enable audit logging to S3 buckets. Let’s explore what native logging into system tables gives you and what the addition of logging to S3 buckets gives.

Using Amazon Redshift System Tables

Using the built-in system tables, you can investigate events quickly using SQL from within the database itself. The retention period for such logs is under a week, so do not expect to use these in the long term. However, such logs are still handy. Here are some of the use cases:

  1. For dashboarding and monitoring purposes.
  2. For debugging and investigating ongoing or “fresh” incidents. Note that it takes time for logs to get from your system tables to your S3 buckets, so new events will only be available in your system tables (see the below section for that).
  3. When you have not enabled native logs, you need to investigate past events that you’re hoping are still retained (the “ouch” option).

The following are the essential system tables (STLs) used for logging in Redshift (note that if you need to log queries, there is an extra step before those are logged—see below):

  • stl_connection_log holds the information about connections, disconnections, and logins to the cluster.
  • stl_userlog lists changes to user definitions.

Why are audit logs significant in Amazon Redshift?

Amazon Redshift is used as a data warehouse or a data lake solution with Redshift Spectrum. As data (and data analytics in general) continues to be democratized, organizations are allowing data access to more users and teams. These processes mean that those organizations also want to understand what’s happening in their data analysis environments, both in terms of security and operational efficiency.

  • stl_query contains the query execution information. Note that the queries here may be truncated, and so for the query texts themselves, you should reconstruct the queries using stl_querytext.
  • stl_querytext holds query text. The rows in this table are split into chunks of 200 characters of query text each, so any query longer than 200 characters requires reconstruction, as shown below.
  • stl_ddltext holds data definition language (DDL)commands: CREATE, ALTER or DROP. You will not find these in the stl_querytext (unlike other databases such as Snowflake, which keeps all queries and commands in one place).
  • stl_utilitytext holds other SQL commands logged, among these important ones to audit such as GRANT, REVOKE, and others.

Here are some examples of queries using these system tables:

Generating a query log from stl_querytext

For doing this, you need to reconstruct the chunks in stl_querytext. As an example, the following query pulls the last ten queries run on the cluster:

SELECT query,
LISTAGG(CASE WHEN LEN(RTRIM(text)) = 0 THEN text ELSE RTRIM(text) END) WITHIN GROUP (ORDER BY sequence) AS query_statement, COUNT(*) as row_count
FROM stl_querytext
GROUP BY query
ORDER BY query desc
LIMIT 10;


The following query returns the top 10 longest queries:

WITH queries AS (
SELECT query,
LISTAGG(CASE WHEN LEN(RTRIM(text)) = 0 THEN text ELSE RTRIM(text) END) WITHIN GROUP (ORDER BY sequence) AS query_statement, COUNT(*) as row_count
FROM stl_querytext
GROUP BY query)
SELECT * FROM queries WHERE query_statement ILIKE 'select%'
ORDER BY LEN(query_statement) DESC
LIMIT 10;

Amazon Redshift Permissions Changes Log

The following log returns all grants and revokes of permissions:

WITH util_cmds AS (
SELECT userid,
LISTAGG(CASE WHEN LEN(RTRIM(text)) = 0
THEN text
ELSE RTRIM(text)
END)
WITHIN GROUP (ORDER BY sequence) AS query_statement
FROM stl_utilitytext GROUP BY userid, xid order by xid)
SELECT util_cmds.userid, stl_userlog.username, query_statement
FROM util_cmds
LEFT JOIN stl_userlog ON (util_cmds.userid = stl_userlog.userid)
WHERE query_statement
ILIKE '%GRANT%' OR query_statement ILIKE '%REVOKE%';

Showing the Last 10 Failed Logins

SELECT *
FROM stl_connection_log
WHERE event='authentication failure'
ORDER BY recordtime DESC
LIMIT 10;


You can find additional system table queries in our Amazon Redshift security guide.

 

Enabling Amazon Redshift Query Logs

This section describes how to automatically export the log data from Redshift to an S3 bucket for long-term storage. Of course, this may incur additional storage costs. The automated audit log can handle three log files:

  • Connection log (or access log): logs the authentication attempts to the cluster, as well as connections and disconnections.
  • User log: this log is for changes in user definitions.
  • User activity log (or query log): logs each query before running on the database.

Let’s log in to the AWS console, head to Redshift, and once inside your Redshift cluster management, select the Properties tab:


Under database configurations, choose Edit audit logging from the Edit button selection box:

In the modal window that opens, either choose to log to a new S3 bucket or specify an existing one, and (optionally) choose a key prefix, in case your bucket is used for more than this cluster’s audit logging, and then hit Save changes:

Be aware that by default, even if you set up logging to be automatically exported to S3, only the connection log and user log will be recorded. If you want to enable query logging as well, follow the instructions in the next section.

 

Enabling Query Logging in Amazon Redshift

Enabling activity logging has a more substantial effect on your storage than logging access only because the queries can be anything from a short string to hundreds of lines of analytic queries. That means that this method will incur additional storage costs if exported to S3. In any case, this change should be done carefully (first testing and on a staging environment, only then on production, while monitoring the impact) to safeguard against performance and cost issues.


User activity logging is disabled by default, and enabling it requires changing the parameter group to set enable_user_activity_logging to true. You can edit the parameter groups setting from the cluster management properties tab (see section above on Enabling Amazon Redshift Query Logs) by choosing “Edit parameter group” in the database configurations section. Alternatively, if you want to edit the current parameter group, click on the parameter group inside that section—it’s here:


A click on the parameter group will take you to the workload management screen, where you should go to the Parameters tab and edit the parameters to set enable_user_activity_logging to true. After that, you will need to (ouch) reboot the cluster to apply the changes.

 

Sensitive Data in Logs

Queries in the system tables are not redacted and kept as-is. That means that they may contain sensitive information. For example, a query such as the following may contain personally identifiable information (PII):

SELECT * FROM users WHERE firstname = 'Vladimir' AND lastname='Humphrey';


The query will appear in a non-redacted format in the logs as well:


This means that your database logs are a place where you can find sensitive data such as PII, protected health information, and more, even though it contains only queries and not the results returned from the database itself. For compliance and security reasons, this will probably require additional controls placed on access to these logs.

 

Analyzing the Audit Logs

Having the audit logs is one thing; analyzing them is a different story. For that, you will either want to use a third-party log collector or build your own ETL service that will make the compressed log files accessible via a data querying engine such as Amazon Athena. When creating this process, you may want to anonymize the queries (mask sensitive data), which is not a trivial task, but it will ensure that the final audit logs are free of sensitive data and can be used by more teams. When building this process, you also usually make sure they are deleted after a certain period (depending on compliance requirements, this varies, but in most cases, you’d like to store these logs for at least a year).

 

Amazon Redshift Audit Logs in Satori

For Amazon Redshift customers of Satori, you can use our Universal Audit feature, which comes out of the box, and logs all activities from all your data platforms (Amazon Redshift and others) in the same place. You can access the audit data in the following ways:


  • Using the Satori user interface
  • Using API calls
  • By connecting our logs so that they’re pushed to your data platform

Some of the advantages of using Satori universal audit vs. using the built-in capabilities of Redshift are outlined in the following sections.

 

Works out of the box

The logs are in one place (as in, there is no need for looking both in system tables and in S3) require no bucket configuration, ETLs, or such housekeeping. Logs are also automatically redacted to remove any sensitive data from the queries.

 

Context-rich information

In addition to the queries, the logs contain a lot of other information, such as the usernames, directory groups, number of records pulled, and even metadata about the result set (column names) and the data transfer classification).

 

Ability to investigate and create reports out of the box

The audit logs in Satori allow you to filter, analyze, and investigate events. You can quickly look for activities involving sensitive data, those done by specific users, those that occurred in particular periods, etc.

 

Access to all data platforms from one single pane

In addition to your Amazon Redshift audit log, you can instantaneously analyze and investigate your logs from multiple data platforms. For example, you can see what a user did across your Redshift, Snowflake, and SQL server environments in one place.


Audit logs are only a small subset of what Satori does; click here to see what else it offers. If you’d like to schedule a Satori demo, click here.