In the spirit of our Snowflake DB security and AWS Redshift security guides, we’ve created the following guide to Google BigQuery security about how to manage your most common security concerns when using Google BigQuery as a data warehouse solution.
Introduction to BigQuery
BigQuery is a managed SaaS serverless data warehouse using Google’s Dremel engine. This enables Google Cloud Platform customers to be up and running in minutes and scale their capabilities with ease.
This guide will cover the following:
- Network access control to Google BigQuery
- Managing BigQuery access across users and groups
- Fine-grained BigQuery access control
- Access logging & monitoring in BigQuery
- Privilege violation logging & monitoring in BigQuery
- Identity management integration for federation
Network access control to Google BigQuery
Using GCP’s VPC Service Controls, you can set network access limitations, as well as other policies, to limit access to multi-tenant services like BigQuery. While the set up of VPC service controls warrants its own guide, let’s briefly touch on a few steps to create IP based restrictions to BigQuery APIs. The restrictions will be set for all access to BigQuery (CLI, API, through the cloud console etc.).
- In your G-suite Admin Panel, go to Security > Context-Aware Access > Access Levels Once there, create the policy with the IP subnets you would like to allow access to.
- In Google Cloud Console, go to VPC Service Controls and create a new perimeter. It is advised to start with a dry-run perimeter first to avoid unintentionally blocking desired traffic.
- Follow the following settings in the wizard (naturally using your own configurations).
- Congrats! You have successfully set up a new policy! From hereon in, any attempt to access BigQuery outside of the permitted subnets of a specific project will result in an error message like this:
Note that setting network access to BigQuery is a blunt tool for specific use-cases. For example, you can’t block access to specific datasets or tables, or perform specific actions based on the source IP.
Managing BigQuery access across users, roles and groups
When controlling access to BigQuery’s data itself, the access control model can be understood as a binding, or a tuple, of 3 items:
- The asset we are checking access for. In BigQuery, the relevant scopes of assets are project-level, dataset-level, and table level (as a beta feature).
- The role, or type of access required for the action performed against the asset. For example, a “viewer” role allows the member to only read from the asset, while an “editor” role allows them to update the asset.
- Members, or who has access to the asset. Members can be users (human or service accounts), groups, domains (any user within a domain that has the specified access). Membership can also be attributed to all authenticated users, though we highly discourage this practice unless absolutely necessary.
Access to all BigQuery resources in a project is managed using Google Cloud Platform’s IAM (Identity & Access Management). Here, you can grant BigQuery roles to users and groups. This can be done programmatically, using Google Cloud API, Gcloud CLI or through the web GCP Cloud Console. For example, you can use Gcloud to set up a new service account that will have access to insert data to a BigQuery project.
In the following illustration, we see two groups: Sales (sales@org.com) and Marketing (mktg@org.com). The entire sales group gets viewer access on the entire sales dataset, and the entire marketing group gets viewer access on the entire marketing dataset. Moreoever, a specific marketing user (mktg_1@org.com) has owner privilege on the “Sales MQLs” table. In practice, most organizations don’t manage their groups using Google GCP suite, but do so using federation with identity management providers like Okta.
Setting up a service account with Data Editor role to BigQuery
Adding a BigQuery read-only user through the web console
In the relevant project, go to “IAM & Admin” in GCP cloud console, click the “Add” button, and fill the details as outlined below:
Note: include the actual email address you wish to grant access, unless you’d like to find me lurking around your data!
Read More:- Blog: Role Explosion in Data Access
- Blog: How To Piss Off Your Data Consumers With Data Access Bureaucracy
- Universal access control with Satori
Fine-grained BigQuery access control
There are several ways to enable granular data access control in BigQuery. You can expose a filtered view of the data using authorized views, or you can set different access configurations per dataset, table or even column.
Using Authorized Views to enable granular access control between datasets
An Authorized View is a view that is authorized for view from a different dataset. The view can filter both the rows and columns of data as per its creation query. This can enable the exposure of only a subset of the data to users who do not have access to the underlying data (the tables and views which the authorized view is pulling data from).
Adding Authorized Views is a twofold process: you need to create a view, followed by the authorized access to the views for other datasets. For example, if we have a table called “salaries”, and we want to enable access of employees only to department managers, we can do the following:
Step 1: create a view
Step 2: authorize the view
In the web console, go to the dataset where you’d like to enable access to the view data, click the “SHARE DATASET” button, and choose the “AUTHORIZED VIEWS” tab. Then choose the view and click ADD:
Note that authorized views are static, not dynamic. In this sense, you will need to create specific views for each filtering and will not be able to pass a parameter to the view. This means that you won’t be able to pass the department name as a parameter and filter the salaries table according to the department name passed).
Table level security in BigQuery
Currently a beta feature, BigQuery allows setting access controls to specific tables within datasets. To enable a certain entity (user, group, anyone on domain, etc) access to a specific table, go to that specific table in the web cloud console, then click “SHARE TABLE”. Next, select the entity to give access to and the access level you wish to grant them:Note that the access setting is positive only, meaning that if a user has access to a dataset, they will have access to all tables in the dataset. If, for example, an organization enables a certain team to access a certain dataset, and following an audit, they want to disallow the team from accessing a certain table, they need to revoke the access to the dataset, and set table level access only instead.
Row level security in BigQuery
Setting row-level access control in BigQuery is possible using views. This adds filtering over the data queried from a table. Next, grant access to the view instead of the underlying table(s). This access is granted either as authorized views (as per the example above), or by giving access specifically to the views (in the same way you give access to tables). If we create the same table and view from the prior example of authorized views, we can also go to the view we created in the web cloud console and set access to this specific view by clicking “SHARE VIEW” and filling the entity and access level like so:
Note that this is exactly the same as enabling access to a specific table and that, as with authorized views, this method is not dynamic. You will need to create a different view for each filtered access you want to make.
Column level security in BigQuery
There are two ways of creating column level access control in BigQuery. One is to use the above example of either authorized views or view-level access byselecting the columns you want to enable access to in the view itself.
Once the view is created, you can either authorize it to a specific dataset or specific identity entities (users, groups, etc). The second method involves the (currently) beta feature of column-based security and requires using Google Data Catalog (which also incurs an additional cost).Using Google Data Catalog to enable column-based security
Once you enable Data Catalog in your web cloud console, select “create and manage policy tags”: In the policy tags screen, click on “CREATE”, and fill the form: Once you’ve added the tag, you need to enable it for fine-grained access control: Users who will access data in tables with policy tags will need to have the Data Catalog Fine Grained Reader role enabled. Once we have both the tags taxonomy defined and Data Catalog Fine Grained Reader role set to the appropriate users, we can add the policy tags by editing the table schema in the UI (or from the CLI or API). Here’s an example in the UI: From BigQuery management choose the salaries table and edit its schema: Now choose the “name” column, and click the “add policy tag” button: In the new screen, select the policy tag that we created, and click “SELECT”: Congratulations, you have just set a policy tag to a specific column! Note that you can only set one tag per column. Running the following query will give you the following error: Now, in order to set a permission for the user, go to the data catalog, go to the tags taxonomy screen, choose the relevant tag(s), click “SHOW INFO PANEL”, and enable access to that specific tag by clicking “ADD MEMBER”: As you can see, this method has several “friction points”, so make sure you study it carefully and ensure that it matches your architecture (both as a one time setup as well as its long-term maintenance) prior to enabling it in your organization. Read More:- Blog: 9 Common Struggles Data Teams Face With Data Masking Projects
- Blog: How To Piss Off Your Data Consumers With Data Access Bureaucracy
- Fine-grained access control to all your data with Satori
Access logging & monitoring in BigQuery
When monitoring queries which were run against BigQuery in a certain project, you can use the REST API, which stores them for 6 months. Queries are a type of Job (jobType) that runs against BigQuery. Other jobTypes are LOAD, EXTRACT, COPY and UNKNOWN.
Pulling your entire BigQuery query log
In certain cases, it is useful to pull the entire query log, either for a one-time analysis, or in a continuous mode. Pulling the data is done by using the jobs.list REST API endpoint to first pull all JOB IDs. Please notice that this API has pagination, so you need to iterate through the pages. If you need the actual queries which were sent, use the jobs.get REST API endpoint (either selectively or for all jobs).If you would like to store native logs moving forward, you can use the Audit Logs feature.
Privilege violation logging & monitoring in BigQuery
Monitoring and investigating privilege violations can be done by analysing the query logs (either from the audit logs or by pulling the data from the REST API). Access errors are available from the jobs.list API endpoint:
It is advised to keep an eye on access violations, to identify both operational issues as well as security issues. Read More:- Blog: Data Security Projects Keep Data Teams Away From Their Core Responsibilities
- Blog: How To Piss Off Your Data Consumers With Data Access Bureaucracy
- Fine-grained access control to all your data with Satori