Hardening GCP BigQuery security: access controls explained

In the spirit of our Snowflake DB security and AWS Redshift security guides, we’ve created the following guide to Google BigQuery security about how to manage your most common security concerns when using Google BigQuery as a data warehouse solution.

 

Introduction to BigQuery

BigQuery is a managed SaaS serverless data warehouse using Google’s Dremel engine. This enables Google Cloud Platform customers to be up and running in minutes and scale their capabilities with ease. 

 

This guide will cover the following:

 

Network access control to Google BigQuery

Using GCP’s VPC Service Controls, you can set network access limitations, as well as other policies, to limit access to multi-tenant services like BigQuery. While the set up of VPC service controls warrants its own guide, let’s briefly touch on a few steps to create IP based restrictions to BigQuery APIs. The restrictions will be set for all access to BigQuery (CLI, API, through the cloud console etc.).

  1. In your G-suite Admin Panel, go to Security > Context-Aware Access > Access Levels Once there, create the policy with the IP subnets you would like to allow access to.

    GCP console network access control

  2. In Google Cloud Console, go to VPC Service Controls and create a new perimeter. It is advised to start with a dry-run perimeter first to avoid unintentionally blocking desired traffic.

    VPC Service Controls
  3. Follow the following settings in the wizard (naturally using your own configurations).

    VPC Service Permiter - details

    VPC Service Permeter - restricted services

    VPC Service Perimeter - access levels

  4. Congrats! You have successfully set up a new policy! From hereon in, any attempt to access BigQuery outside of the permitted subnets of a specific project will result in an error message like this:

VPC Service Permiter - sql console

Note that setting network access to BigQuery is a blunt tool for specific use-cases. For example, you can’t block access to specific datasets or tables, or perform specific actions based on the source IP.

 

Managing BigQuery access across users, roles and groups

When controlling access to BigQuery’s data itself, the access control model can be understood as a binding, or a tuple, of 3 items:

  • The asset we are checking access for. In BigQuery, the relevant scopes of assets are project-level, dataset-level, and table level (as a beta feature).

  • The role, or type of access required for the action performed against the asset. For example, a “viewer” role allows the member to only read from the asset, while an “editor” role allows them to update the asset.

  • Members, or who has access to the asset. Members can be users (human or service accounts), groups, domains (any user within a domain that has the specified access). Membership can also be attributed to all authenticated users, though we highly discourage this practice unless absolutely necessary.

Access to all BigQuery resources in a project is managed using Google Cloud Platform’s IAM (Identity & Access Management). Here, you can grant BigQuery roles to users and groups. This can be done programmatically, using Google Cloud API, Gcloud CLI or through the web GCP Cloud Console. For example, you can use Gcloud to set up a new service account that will have access to insert data to a BigQuery project.

 

In the following illustration, we see two groups: Sales (sales@org.com) and Marketing (mktg@org.com). The entire sales group gets viewer access on the entire sales dataset, and the entire marketing group gets viewer access on the entire marketing dataset. Moreoever, a specific marketing user (mktg_1@org.com) has owner privilege on the “Sales MQLs” table. In practice, most organizations don’t manage their groups using Google GCP suite, but do so using federation with identity management providers like Okta.

bigquery groups diagram

Setting up a service account with Data Editor role to BigQuery

# creating a new service account
gcloud iam service-accounts create satori-ben-test \
--description "Service account with DataEditor permission to BigQuery" \
--display-name "satori-ben-test"

# In case you want to see the new service account:
# gcloud iam service-accounts list

# Add access binding:
gcloud projects add-iam-policy-binding PROJECT_NAME \ --member=serviceAccount:satori-ben-test@PROJECT_NAME.iam.gserviceaccount.com \
--role='roles/bigquery.dataEditor'

# create key for API access, in case needed:
gcloud iam service-accounts keys create ./satori-ben-test-key.json \
--iam-account satori-ben-test@PROJECT_NAME.iam.gserviceaccount.com

Adding a BigQuery read-only user through the web console 

In the relevant project, go to “IAM & Admin” in GCP cloud console, click the “Add” button, and fill the details as outlined below:

add members to gcp project

Careful to include the actual email address you wish to grant access, unless you’d like to find me lurking around your data!

 

Fine-grained BigQuery access control

There are several ways to enable granular data access control in BigQuery. You can expose a filtered view of the data using authorized views, or you can set different access configurations per dataset, table or even column.

 

Using Authorized Views to enable granular access control between datasets

An Authorized View is a view that is authorized for view from a different dataset. The view can filter both the rows and columns of data as per its creation query. This can enable the exposure of only a subset of the data to users who do not have access to the underlying data (the tables and views which the authorized view is pulling data from).

 

Adding Authorized Views is a twofold process: you need to create a view, followed by the authorized access to the views for other datasets. For example, if we have a table called “salaries”, and we want to enable access of employees only to department managers, we can do the following:

Step 1: create a view

# create the table in dataset a_b_c
CREATE TABLE a_b_c.salaries (name string, department string, salary numeric);

# create the view for "sales" department:
CREATE VIEW a_b_c.sales_salaries AS (
SELECT * FROM a_b_c.salaries WHERE department='sales');

Step 2: authorize the view

In the web console, go to the dataset where you’d like to enable access to the view data, click the “SHARE DATASET” button, and choose the “AUTHORIZED VIEWS” tab. Then choose the view and click ADD:

bigquery authorized views

Note that authorized views are static, not dynamic. In this sense, you will need to create specific views for each filtering and will not be able to pass a parameter to the view. This means that you won’t be able to pass the department name as a parameter and filter the salaries table according to the department name passed).

 

Table level security in BigQuery

Currently a beta feature, BigQuery allows setting access controls to specific tables within datasets. To enable a certain entity (user, group, anyone on domain, etc) access to a specific table, go to that specific table in the web cloud console, then click “SHARE TABLE”. Next, select the entity to give access to and the access level you wish to grant them:bigquery user roles

Note that the access setting is positive only, meaning that if a user has access to a dataset, they will have access to all tables in the dataset. If, for example, an organization enables a certain team to access a certain dataset, and following an audit, they want to disallow the team from accessing a certain table, they need to revoke the access to the dataset, and set table level access only instead.

 

Row level security in BigQuery

Setting row-level access control in BigQuery is possible using views. This adds filtering over the data queried from a table. Next, grant access to the view instead of the underlying table(s). This access is granted either as authorized views (as per the example above), or by giving access specifically to the views (in the same way you give access to tables). If we create the same table and view from the prior example of authorized views, we can also go to the view we created in the web cloud console and set access to this specific view by clicking “SHARE VIEW” and filling the entity and access level like so: 

bigquery member role viewer

 

Note that this is exactly the same as enabling access to a specific table and that, as with authorized views, this method is not dynamic. You will need to create a different view for each filtered access you want to make.

 

Column level security in BigQuery

There are two ways of creating column level access control in BigQuery. One is to use the above example of either authorized views or view-level access byselecting the columns you want to enable access to in the view itself. 

 

# create the view for salary details without actual names, for data crunching
CREATE VIEW a_b_c.salaries_dataonly AS (
SELECT department, salary FROM a_b_c.salaries);

Once the view is created, you can either authorize it to a specific dataset or specific identity entities (users, groups, etc).

 

The second method involves the (currently) beta feature of column-based security and requires using Google Data Catalog (which also incurs an additional cost). 

 

Using Google Data Catalog to enable column-based security

Once you enable Data Catalog in your web cloud console, select “create and manage policy tags”:

BigQuery policy tags

In the policy tags screen, click on “CREATE”, and fill the form:

BigQuery - policy tags taxonomy

 

Once you’ve added the tag, you need to enable it for fine-grained access control:

BigQuery metadata access control

 

Users who will access data in tables with policy tags will need to have the Data Catalog Fine Grained Reader role enabled.

 

Once we have both the tags taxonomy defined and Data Catalog Fine Grained Reader role set to the appropriate users, we can add the policy tags by editing the table schema in the UI (or from the CLI or API). Here’s an example in the UI:

 

From BigQuery management choose the salaries table and edit its schema:

data preview in BigQuery

Now choose the “name” column, and click the “add policy tag” button:

BigQuery tags

In the new screen, select the policy tag that we created, and click “SELECT”:

BigQuery - add a policy tag

Congratulations, you have just set a policy tag to a specific column! Note that you can only set one tag per column. Running the following query will give you the following error:

SELECT * FROM <PROJECT_NAME>.a_b_c.salaries;

 

access denied message in BigQuery

Now, in order to set a permission for the user, go to the data catalog, go to the tags taxonomy screen, choose the relevant tag(s), click “SHOW INFO PANEL”, and enable access to that specific tag by clicking “ADD MEMBER”:

Add members and roles for PII resource in BigQuery

As you can see, this method has several “friction points”, so make sure you study it carefully and ensure that it matches your architecture (both as a one time setup as well as its long-term maintenance) prior to enabling it in your organization.

 

Access logging & monitoring in BigQuery

When monitoring queries which were run against BigQuery in a certain project, you can use the REST API, which stores them for 6 months. Queries are a type of Job (jobType) that runs against BigQuery. Other jobTypes are LOAD, EXTRACT, COPY and UNKNOWN.

 

Pulling your entire BigQuery query log

In certain cases, it is useful to pull the entire query log, either for a one-time analysis, or in a continuous mode. Pulling the data is done by using the jobs.list REST API endpoint to first pull all JOB IDs. Please notice that this API has pagination, so you need to iterate through the pages. If you need the actual queries which were sent, use the jobs.get REST API endpoint (either selectively or for all jobs).

 

If you would like to store native logs moving forward, you can use the Audit Logs feature.

 

Privilege violation logging & monitoring in BigQuery

Monitoring and investigating privilege violations can be done by analysing the query logs (either from the audit logs or by pulling the data from the REST API). Access errors are available from the jobs.list API endpoint:

 

 "errorResult": {
 "reason": "accessDenied",
 "location": "projects/[REDACTED]/locations/us/taxonomies/[REDACTED]/policyTags/[REDACTED]",
 "message": "Access Denied: BigQuery BigQuery: User does not have permission to access policy tag \"PII : person name\" on column test-ben1.a_b_c.salaries.name."
 },

It is advised to keep an eye on access violations, to identify both operational issues as well as security issues. 

 

Identity management integration for federation

It is less common to use Google as an organizational identity management system for enterprise organizations. Most organizations are using Identity Management Providers, such as Okta, where the authentication is handled via SSO per your organization’s users and groups.

 

Once integrated, BigQuery (as well as the rest of the gcp services) obtains the users and groups settings from the IDP, and you can set the privileges for the appropriate groups.

 

Conclusion

BigQuery (as well as the entire GCP platform) comes with a suite of security controls to help secure your data warehouse. However, some aspects are harder to set up and maintain, or set up in a granular way that strikes the right balance between optimal usability and security balance.

 

This is where Satori is happy to step in and help.

 

Satori and network access control

Satori can help you achieve granular network-based access control when deployed in front of BigQuery. For example, using Satori, you can allow global access to your BigQuery data while prohibiting access to specific resources (datasets, tables, rows or columns) or data types (PII) outside of a certain set of IP addresses.

 

Satori and fine-grained access control to BigQuery

Though possible to create fine-grained access control in BigQuery, its functions aren’t always dynamic enough (i.e. to set views for extensive situations) or simply lacking (ie. the limitation of one policy tag in column based security). 

 

When deploying Satori in front of BigQuery, you can create column, row and cell-based granular access controls tied to business rules instead of the securable objects themselves. This enables easy maintenance of such policies and a gradual roll-out process (i.e beginning with alert-only for a single data analytics team). You can also achieve easy visibility into existing data access patterns and violations on the Satori dashboard.

 

In addition, you can set fine-grained access control according to the types of data queried by users, instead of having to set it up manually. Moreover, decoupling the security from the data store itself allows better flexibility, portability and transparency across different data stores.

 

Satori and keeping track of access violations

Satori keeps track of logins and sessions to Satori-protected BigQuery, enabling security policies to handle such violations as well as enhanced visibility into suspicious incidents.

 

If you would like to hear more about our data protection product specific to data warehouses like BigQuery, please contact us.

 

I hope you enjoyed reading this as much as I enjoyed writing it. If you encounter errors or think of parameters I neglected to review, please feel free to reach out to me on twitter (@kernelXSS) or by email and I'll be happy to enhance this post.

 

Schedule a Demo