I appreciate data access logs as much as the next person. I would even say that I appreciate logs more than most, especially when using them analytically to provide a more complete picture. Given that, in this blogpost I will discuss why relying solely on data access logs can be insufficient and why using a universal data access control is so appealing. I will focus my examples on cloud data warehouses, but the same concept can be applied, with minor variations, to databases as well.
Data access log content
“Native Logs” or Data Access Logs are logs which are generated by the database engine and provide information about database transactions. The main questions they answer are:- When did the transaction occur? (usually the start and end times).
- Who was accessing the data? The answer is usually the user who sent the query to the database, although it may contain additional information including the client application used, the client IP address, or other identifying information. Sometimes, the log also contains information about the role used.
- What data was accessed? In most cases, this information means having the query which was sent to the engine.
- Were there any errors issued for the transaction (with varying levels of details)?
- How much data was scanned, and what was the timing of specific parts of the query? The log provides this operational data, as well as other details, which can be useful for optimizations and cost analytics, especially when there is a pay-per-query element (e.g. in Snowflake and BigQuery data warehouses or in AWS Athena query engine).
- What the organization is planning to do with the data access logs (or, in some unfortunate cases, what it finds out it wants to do retroactively). Is it looking to keep the data access logs as an audit for compliance purposes? Is it trying to use the data access logs as part of a data breach investigation or another type of incident response? Is it trying to optimize costs?
- Various compliance and regulation requirements the organization adheres to. These regulations may mean that data access logs have to be kept separate from the data store itself, or it may mean having a longer retention time than what is naturally provided by the data store itself. This may also mean that the log has to be enriched by more data.
Common Uses of Data Access Logs
Logs are very important and useful in general. Data access logs are extremely critical, as they can shed light on data access, and data is, in most cases, the business’s biggest asset (and liability). Here are common uses of data access logs:- To fulfill requirements. We need logs because we are required to have them. Different regulations and compliance frameworks, as well as security frameworks which organizations comply with for legal reasons, commercial reasons, or risk reduction reasons, consistently require retaining access logs. Some of these requirements necessitate that we crunch the data access log for different reports, such as invalid access attempts and administrative access attempts. Other guidelines impose different retention or storage restrictions for the logs, but, regardless of the specific protocol, requiring some type of audit for data access operations is very common. Examples of such requirements are NIST Cyber-Secuirty Framework PR.PT-1, and PCI DSS 4.2.
- Incident response and forensic investigations. In these cases, we are trying to understand more about events that occured, and a log that records those events is often a key element in the investigation. The investigation may be performed simply in order to assert that a certain incident was contained to one environment and did not have further implications (e.g. an endpoint was compromised, and we want to make sure that the credentials were not used to access the data warehouse), or it may be part of an extensive data breach investigation which is trying to establish the exact impact.
- Gaining visibility. Logs can help create dashboards or report capabilities on an otherwise “blackbox” system, revealing who are the active users and roles and what actions are being taken. This, in most cases, requires efforts to transform the millions of lines in the logs to something that highlights and quantifies the main activities which transpired.
- Reduce over-permissions. By analyzing data access logs, you can gain answers to questions such as, “Which users are exposed to data that they are not actually using?” By using the data access logs, you can reduce this risk. Depending on the desired depth, this process requires significantly investing in data analysis. (At Satori, we recently added this as an out-of-the-box capability for our customers who use Snowflake).
- Proactively find threats. By analyzing logs, you can locate anomalous behavior by data consumers which may indicate a security risk.
- Operational efficiency., Analyzing the costs can be valuable, especially in pay-per-query engines, although it is not always straightforward. Cost analysis is sometimes performed for internal billing of different teams accessing the data, to find anomalies, and to correct expensive data consumption habits.