Querying data

Overview

This document describes several ways to query data using BigQuery, including running query jobs, interactive queries, and batch queries. This document also discusses how to use query caching, return large query results, and query meta-tables.

Query results are always saved to either a temporary or permanent table. You can choose whether to append or overwrite data in an existing table and whether to create a new table if none exists by that name.

Running queries

Running queries with CLI:

bq query <query string>

An ending ; (semicolon) mark is not required.
You do not need to specify the project ID or dataset ID if you have defined default values.
To enable standard SQL for a query, set the--use_legacy_sql flag tofalse.
You can use the common--formatflag to specify other output formats, such as JSON or CSV.
If a query will take a long time to run, you can run it asynchronously in the command line using the --nosync flag.
If you need to flatten the results set when you query nested and repeated data, use the--flatten_resultsflag.
To save the results to a table, use the --destination_table flag.
If your query results exceed the maximum response size, you can allow large results using --allow_large_results flag and write the large result set to a destination table.

The following example uses the--destination_tableflag to create a permanent table based on the query results.

bq query --destination_table=mydataset.happyhalloween "SELECT name, number
    FROM [bigquery-public-data:usa_names.usa_1910_current
    WHERE gender = 'M'
    ORDER BY number DESC
    LIMIT 6"

Running queries with the API:

You must insert a new job of type query using Jobs:insert API and populate the jobs#configuration.query property.
Poll for results by calling getQueryResults. Poll until jobComplete equalstrue. Check for errors and warnings in the errors list.
To save the query results to a permanent table, include a value for the jobs#configuration.query.destinationTable property.
REST API methods accept three types of parameters: _path _parameters, _query _parameters, and _body _parameters. The following method signature demonstrates all three parameter types:
```
PUT https://www.googleapis.com/bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}?userIp="192.0.2.211"
{
   "friendlyName": string,
   "description": string
}
```

Running queries with Java client:

Create a new QueryJobConfiguration and use the BigQuery.create() function to insert the query job.
Once the job completes, you can page through the results by calling BigQuery.getQueryResults().

public static void runSimpleQuery(String queryString)
    throws TimeoutException, InterruptedException {
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString).build();

  runQuery(queryConfig);
}
public static void runQuery(QueryJobConfiguration queryConfig)
    throws TimeoutException, InterruptedException {
  BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

  // Create a job ID so that we can safely retry.
  JobId jobId = JobId.of(UUID.randomUUID().toString());
  Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

  // Wait for the query to complete.
  queryJob = queryJob.waitFor();

  // Check for errors
  if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
  } else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
  }

  // Get the results.
  QueryResponse response = bigquery.getQueryResults(jobId);
  QueryResult result = response.getResult();

  // Print all pages of the results.
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      for (FieldValue val : row) {
        System.out.printf("%s,", val.toString());
      }
      System.out.printf("\n");
    }

    result = result.getNextPage();
  }
}

To save the result in a permanent table, set the destination table in the QueryJobConfiguration:

QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString)
          // Save the results of the query to a permanent table.
          // See: https://cloud.google.com/bigquery/querying-data#permanent-table
          .setDestinationTable(TableId.of(destinationDataset, destinationTable))
          // Allow results larger than the maximum response size.
          // If true, a destination table must be set.
          // See: https://cloud.google.com/bigquery/querying-data#large-results
          .setAllowLargeResults(allowLargeResults)
          .build();

Parameterized queries

BigQuery supports query parameters to help prevent SQL injection when queries are constructed using user input. This feature is only available with standard SQL syntax.

To specify a named parameter, use the@character followed by an identifier, such as@param_name. For example, this query finds all the words in a specific Shakespeare corpus with counts that are at least the specified value.

#standardSQL
SELECT
  word,
  word_count
FROM
  `bigquery-public-data.samples.shakespeare`
WHERE
  corpus = @corpus
  AND word_count >= @min_word_count
ORDER BY
  word_count DESC;

Alternatively, use the placeholder value ? to specify a positional parameter. Note that a query can use positional or named parameters but not both.

Running queries with CLI:

Use--parameterto provide values for parameters in the form "name:type:value". An empty name produces a positional parameter. The type may be omitted to assume STRING.

The--parameterflag must be used in conjunction with the flag--use_legacy_sql=Falseto specify standard SQL syntax.

bq query --use_legacy_sql=False \

  --parameter=corpus::romeoandjuliet \  
  --parameter=min\_word\_count:INT64:250 \  
  'SELECT word, word\_count  
  FROM `bigquery-public-data.samples.shakespeare`  
  WHERE corpus = @corpus  
  AND word\_count &gt;= @min\_word\_count  
  ORDER BY word\_count DESC;'

Running queries with API:

To use named parameters, set the jobs#configuration.query.parameterMode to NAMED.
Populate jobs#configuration.query.queryParameters[] with the list of parameters. Set the name of each parameter with the@param_nameused in the query.
Enable standard SQL syntax by settinguseLegacySqltofalse.

{
  "query": "SELECT word, word_count FROM `bigquery-public-data.samples.shakespeare` WHERE corpus = @corpus AND word_count >= @min_word_count ORDER BY word_count DESC;",
  "queryParameters": [
    {
      "parameterType": {
        "type": "STRING"
      },
      "parameterValue": {
        "value": "romeoandjuliet"
      },
      "name": "corpus"
    },
    {
      "parameterType": {
        "type": "INT64"
      },
      "parameterValue": {
        "value": "250"
      },
      "name": "min_word_count"
    }
  ],
  "useLegacySql": false,
  "parameterMode": "NAMED"
}

Running queries with Java client

private static void runNamed(final String corpus, final long minWordCount)
    throws InterruptedException {
  BigQuery bigquery =
      new BigQueryOptions.DefaultBigqueryFactory().create(BigQueryOptions.getDefaultInstance());

  String queryString =
      "SELECT word, word_count\n"
          + "FROM `bigquery-public-data.samples.shakespeare`\n"
          + "WHERE corpus = @corpus\n"
          + "AND word_count >= @min_word_count\n"
          + "ORDER BY word_count DESC";
  QueryRequest queryRequest =
      QueryRequest.newBuilder(queryString)
          .addNamedParameter("corpus", QueryParameterValue.string(corpus))
          .addNamedParameter("min_word_count", QueryParameterValue.int64(minWordCount))
          // Standard SQL syntax is required for parameterized queries.
          // See: https://cloud.google.com/bigquery/sql-reference/
          .setUseLegacySql(false)
          .build();

  // Execute the query.
  QueryResponse response = bigquery.query(queryRequest);

  // Wait for the job to finish (if the query takes more than 10 seconds to complete).
  while (!response.jobCompleted()) {
    Thread.sleep(1000);
    response = bigquery.getQueryResults(response.getJobId());
  }

  // Check for errors.
  if (response.hasErrors()) {
    String firstError = "";
    if (response.getExecutionErrors().size() != 0) {
      firstError = response.getExecutionErrors().get(0).getMessage();
    }
    throw new RuntimeException(firstError);
  }

  // Print all pages of the results.
  QueryResult result = response.getResult();
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      System.out.printf("%s: %d\n", row.get(0).getStringValue(), row.get(1).getLongValue());
    }

    result = result.getNextPage();
  }
}

Running query with Python client:

def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)

def print_results(query_results):
    """Print the query results by requesting a page at a time."""
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break

def query_named_params(corpus, min_word_count):
    client = bigquery.Client()
    query = """
        SELECT word, word_count
        FROM `bigquery-public-data.samples.shakespeare`
        WHERE corpus = @corpus
        AND word_count >= @min_word_count
        ORDER BY word_count DESC;
        """
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=(
            bigquery.ScalarQueryParameter('corpus', 'STRING', corpus),
            bigquery.ScalarQueryParameter(
                'min_word_count', 'INT64', min_word_count)))
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

Running batch queries

By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit. All of the examples in Running queries are interactive queries.

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf, and starts the query as soon as idle resources are available, usually within a few minutes. If BigQuery hasn't started the query within 24 hours, BigQuery changes the job priority to interactive. Batch queries don't count towards your concurrent rate limit, which can make it easier to start many queries at once.

From the command-line, use the--batchflag to run a batch query. When using the API, include the configuration.query.priority property, with the value set toBATCH.

BigQuery writes all query results to a table. It is either a destination table that is identified by the user, or a temporary, cached results table. Temporary, cached results tables are maintained per-user, per- project.

When you run a duplicate query, BigQuery attempts to reuse cached results. When query results are retrieved from a cached results table, you are not charged for the query. However, the queries are still subject to the BigQuery quota policies. In addition to reducing costs, queries that use cached results are significantly faster because BigQuery does not need to compute the result set.

All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions.

Query results are not cached:

When a destination table is specified
If any of the referenced tables or logical views have changed since the results were previously cached
When any of the tables referenced by the query have recently received streaming inserts (a streaming buffer is attached to the table) even if no new rows have arrived
If the query uses non-deterministic functions; for example, date and time functions such asCURRENT_TIMESTAMP()andNOW(), and other functions such asCURRENT_USER()return different values depending on when a query is executed
If you are querying multiple tables using a wildcard
If the cached results have expired; typical cache lifetime is 24 hours, but the cached results are best-effort and may be invalidated sooner
the result set is larger than the maximum response size.

If you use thejobs.insert()function to run a query, you can force a query job to fail unless cached results can be used by setting thecreateDispositionproperty of the job configuration toCREATE_NEVER. If the query result does not exist in the cache, aNOT_FOUNDerror is returned.

Returning large result sets

Normally, queries have a maximum response size of 128MB uncompressed. If you expect larger results, you can setallowLargeResultstotrueand you must specify a destination table. You incur storage charges for the destination table.

Queries with large results are subject to these limitations:

You must specify a destination table.
You cannot specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BYclause.

Paging through list results

Allcollection.listmethods return paginated results under certain circumstances. The number of results per page is controlled by themaxResultsproperty.

Pagination criteria	Default	`maxResults`	value	Maximum	`maxResults`	value
			`Tabledata.list`	Returns paginated results if the response size is more than 10 MB of serialized JSON or more than`maxResults`rows.	100,000	100,000
			All other`collection.list`methods	Returns paginated results if the response is more than`maxResults`rows.	50	1,000

If you setmaxResultsto a value greater than the maximum value listed above, the results are paginated based on the maximum value.

Working with arrays

In BigQuery, an array is an ordered list consisting of zero or more values of the same data type. You can construct arrays of simple data types, such asINT64, and complex data types, such asSTRUCTs. The current exception to this is theARRAYdata type—arrays of arrays are not supported.

With BigQuery, you can construct array literals, build arrays from subqueries using theARRAY()operator, and aggregate values into an array using theARRAY_AGG()function.

You can combine arrays using functions likeARRAY_CONCAT(), and convert arrays to strings usingARRAY_TO_STRING().

Flattening arrays

A common objective when working with arrays is to _flatten _the arrays into multiple rows. A query that flattens an array returns a row for each element in the array.

In BigQuery, you flatten arrays using aCROSS JOIN. For example, the following statement uses aCROSS JOINin conjunction with theUNNESToperator to flatten an array.

WITH races AS (
  SELECT "800M" AS race,
    [STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
     STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
     STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
     STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
     STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
     STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
     STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
     STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
       AS participants)
SELECT
  race,
  participant
FROM races r
CROSS JOIN UNNEST(r.participants) as participant;

+------+---------------------------------------+
| race | participant                           |
+------+---------------------------------------+
| 800M | {Rudisha, [23.4, 26.3, 26.4, 26.1]}   |
| 800M | {Makhloufi, [24.5, 25.4, 26.6, 26.1]} |
| 800M | {Murphy, [23.9, 26, 27, 26]}          |
| 800M | {Bosse, [23.6, 26.2, 26.5, 27.1]}     |
| 800M | {Rotich, [24.7, 25.6, 26.9, 26.4]}    |
| 800M | {Lewandowski, [25, 25.7, 26.3, 27.2]} |
| 800M | {Kipketer, [23.2, 26.1, 27.3, 29.4]}  |
| 800M | {Berian, [23.7, 26.1, 27, 29.3]}      |
+------+---------------------------------------+

You can find specific information from repeated fields. For example, the following query returns the fastest racer in an 800M race.

This example does not involve flattening an array, but does represent a common way to get information from a repeated field.

WITH races AS (
  SELECT "800M" AS race,
    [STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
     STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
     STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
     STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
     STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
     STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
     STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
     STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
       AS participants)
SELECT
  race,
  (SELECT name
   FROM UNNEST(participants)
   ORDER BY (
     SELECT SUM(duration)
     FROM UNNEST(splits) AS duration) ASC
   LIMIT 1) AS fastest_racer
FROM races;

+------+---------------+
| race | fastest_racer |
+------+---------------+
| 800M | Rudisha       |
+------+---------------+

Creating Arrays From Subqueries

A common task when working with arrays is turning a subquery result into an array. In BigQuery, you can accomplish this using theARRAY()operator.

For example, consider the following operation on thesequencestable:

WITH sequences AS
  (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
  UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
  UNION ALL SELECT [5, 10] AS some_numbers)
SELECT some_numbers,
  ARRAY(SELECT x * 2
        FROM UNNEST(some_numbers) AS x) AS doubled
FROM sequences;

+--------------------+---------------------+
| some_numbers       | doubled             |
+--------------------+---------------------+
| [0, 1, 1, 2, 3, 5] | [0, 2, 2, 4, 6, 10] |
| [2, 4, 8, 16, 32]  | [4, 8, 16, 32, 64]  |
| [5, 10]            | [10, 20]            |
+--------------------+---------------------+

Arrays and Aggregation

With BigQuery, you can aggregate values into an array usingARRAY_AGG().

WITH fruits AS
  (SELECT "apple" AS fruit
   UNION ALL SELECT "pear" AS fruit
   UNION ALL SELECT "banana" AS fruit)
SELECT ARRAY_AGG(fruit) AS fruit_basket
FROM fruits;

+-----------------------+
| fruit_basket          |
+-----------------------+
| [apple, pear, banana] |
+-----------------------+

The array returned byARRAY_AGG()is in an arbitrary order, since the order in which the function concatenates values is not guaranteed. To order the array elements, useORDER BY. For example:

WITH fruits AS
  (SELECT "apple" AS fruit
   UNION ALL SELECT "pear" AS fruit
   UNION ALL SELECT "banana" AS fruit)
SELECT ARRAY_AGG(fruit ORDER BY fruit) AS fruit_basket
FROM fruits;

+-----------------------+
| fruit_basket          |
+-----------------------+
| [apple, banana, pear] |
+-----------------------+

You can also apply aggregate functions such asSUM()to the elements in an array. For example, the following query returns the sum of array elements for each row of thesequencestable.

WITH sequences AS
  (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
   UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
   UNION ALL SELECT [5, 10] AS some_numbers)
SELECT some_numbers,
  (SELECT SUM(x)
   FROM UNNEST(s.some_numbers) x) AS sums
FROM sequences s;

+--------------------+------+
| some_numbers       | sums |
+--------------------+------+
| [0, 1, 1, 2, 3, 5] | 12   |
| [2, 4, 8, 16, 32]  | 62   |
| [5, 10]            | 15   |
+--------------------+------+

BigQuery also supports an aggregate function,ARRAY_CONCAT_AGG(), which concatenates the elements of an array column across rows.

WITH aggregate_example AS
  (SELECT [1,2] AS numbers
   UNION ALL SELECT [3,4] AS numbers
   UNION ALL SELECT [5, 6] AS numbers)
SELECT ARRAY_CONCAT_AGG(numbers) AS count_to_six_agg
FROM aggregate_example;

+--------------------------------------------------+
| count_to_six_agg                                 |
+--------------------------------------------------+
| [1, 2, 3, 4, 5, 6]                               |
+--------------------------------------------------+

Note:The array returned byARRAY_CONCAT_AGG()is non-deterministic, since the order in which the function concatenates values is not guaranteed.

Converting Arrays to Strings

TheARRAY_TO_STRING()function allows you to convert an array of strings to a single string.

WITH greetings AS
  (SELECT ["Hello", "World"] AS greeting)
SELECT ARRAY_TO_STRING(greeting, " ") AS greetings
FROM greetings;

+-------------+
| greetings   |
+-------------+
| Hello World |
+-------------+

Combining Arrays

In some cases, you might want to combine multiple arrays into a single array. You can accomplish this using theARRAY_CONCAT()function.

SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six;

+--------------------------------------------------+
| count_to_six                                     |
+--------------------------------------------------+
| [1, 2, 3, 4, 5, 6]                               |
+--------------------------------------------------+

Building arrays of arrays

BigQuery does not support building arrays of arrays directly. Instead, you must create an array of structs, with each struct containing a field of typeARRAY. To illustrate this, consider the followingpointstable:

+----------+
| point    |
+----------+
| [1, 5]   |
| [2, 8]   |
| [3, 7]   |
| [4, 1]   |
| [5, 7]   |
+----------+

Now, let's say you wanted to create an array consisting of eachpointin thepointstable. To accomplish this, wrap the array returned from each row in aSTRUCT, as shown below.

WITH points AS
  (SELECT [1, 5] as point
   UNION ALL SELECT [2, 8] as point
   UNION ALL SELECT [3, 7] as point
   UNION ALL SELECT [4, 1] as point
   UNION ALL SELECT [5, 7] as point)
SELECT ARRAY(
  SELECT STRUCT(point)
  FROM points)
  AS coordinates;

+----------------------------------------------------+
| coordinates                                        |
+----------------------------------------------------+
| [{[1, 5]}, {[2, 8]}, {[3, 7]}, {[4, 1]}, {[5, 7]}] |
+----------------------------------------------------+

Querying data using wildcard table

Wildcard tables enable you to query several tables concisely. For example, the GSOD weather datasetcontains annual tables that all share the common prefixgsodfollowed by the four-digit year. The tables are namedgsod1929,gsod1930,gsod1931, and so on.

To query a group of tables that share a common prefix, use the table wildcard symbol (*) after the table prefix in yourFROMstatement. For example, the following query finds the maximum temperature reported during the 1940s:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod194*`
WHERE
  max != 9999.9 # code for missing data
ORDER BY
  max DESC

Filtering selected tables using _TABLE_SUFFIX

To restrict the query so that it scans an arbitrary set of tables, use the_TABLE_SUFFIXpseudo column in theWHEREclause. The_TABLE_SUFFIXpseudo column contains the values matched by the table wildcard. For example, the previous sample query that scans all tables from the 1940s uses a table wildcard to represent the last digit of the year:

FROM
  `bigquery-public-data.noaa_gsod.gsod194*`

The corresponding _TABLE_SUFFIX pseudo column contains values in the range 0 through 9, representing the tables gsod1940 through gsod1949. These _TABLE_SUFFIX values can be used in the WHERE clause to filter for specific tables.

#standardSQL

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod194*`
WHERE
  max != 9999.9 # code for missing data
  AND ( _TABLE_SUFFIX = '0'
    OR _TABLE_SUFFIX = '4' )
ORDER BY
  max DESC

Using _TABLE_SUFFIF can greatly reduce the number of bytes scanned, which reduces the cost of running your queries.

Filters on _TABLE_SUFFIX that include subqueries cannot be used to limit the number of tables scanned for a wildcard table.

Scanning a range of tables using _TABLE_SUFFIX

To scan a range of tables, use the_TABLE_SUFFIXpseudo column along with theBETWEENclause.For example, to find the maximum temperature reported in the years between 1929 and 1935 inclusive, use the table wildcard to represent the last two digits of the year:

#standardSQL
SELECT
  max,
  ROUND((max-32)*5/9,1) celsius,
  mo,
  da,
  year
FROM
  `bigquery-public-data.noaa_gsod.gsod19*`
WHERE
  max != 9999.9 # code for missing data
  AND _TABLE_SUFFIX BETWEEN '29' and '35'
ORDER BY
  max DESC

Querying all tables in a dataset

To scan all tables in a dataset, you can use an empty prefix and the table wildcard, which means that the_TABLE_SUFFIXpseudo column contains full table names.

FROM
  `bigquery-public-data.noaa_gsod.*`

Limitations of wildcard tables

Wildcard table queries are subject to the following limitations.

Wildcard tables do not support views. The query returns an error if the wildcard table matches any views in the dataset.
Wildcard tables do not support cached results. If you run the same wildcard query multiple times, you are billed for each query.

Saved and shared queries

This page describes how to save and share queries. Queries that are saved or shared contain only the query itself, which means that:

A shared query provides the query text only. You still need to have appropriate access to query the data.
The only query option retained by a saved query is the "SQL Version", which is either legacy SQL or standard SQL.

To share a query, set its visibility to Project or Public when you save the query.

Project visibility makes the query available to other project members.
- Project members with OWNER/WRITER permissions have read-write access.
- Project members with READ permissions have read access.
Public visibility makes the query viewable, but not editable, to anyone with the link.

Using Views

A view is a virtual table defined by a SQL query. You can query views in BigQuery using the web UI, the command-line tool, or the API. You can also use a view as a data source for a visualization tool such as Google Data Studio 360.

BigQuery's views are logical views, not materialized views. Because views are not materialized, the query that defines the view is run each time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.

Limitations of views

BigQuery views are subject to the following limitations:

You cannot run a BigQuery job that exports data from a view.
You cannot use the TableDataListJSON API method to retrieve data from a view. For more information, see Tabledata: list.
You cannot mix standard SQL and legacy SQL queries when using views. A standard SQL query cannot reference a view defined using legacy SQL syntax.
The schemas of the underlying tables are stored with the view when the view is created. If columns are added, deleted, and so on after the view is created, the reported schema will be inaccurate until the view is updated. Even though the reported schema may be inaccurate, all submitted queries produce accurate results.
You cannot update a legacy SQL view to standard SQL in the BigQuery web UI. You can change the SQL language using the command-line tool bq update --view command or by using the update or patch API methods.
You cannot include a user-defined function in the SQL query that defines a view.
BigQuery supports up to four levels of nested views in legacy SQL. If there are more than four levels, an INVALID_INPUT error returns. In standard SQL, you are limited to 100 levels of nested views.
You are limited to 1,000 authorized views per dataset.

Assigning access controls to views

You cannot assign access controls directly to views. When you create a view in a dataset, the dataset's access controls determine the users and groups allowed to access the view.

For example, if you assign thebigquery.dataOwnerrole to a user at the project level, that user can create, update, get, and delete views in all of the project's datasets. If you assign thebigquery.dataOwnerrole at the dataset level, the user can create, update, get, and delete views only in that dataset.

You can apply access controls to a dataset after the dataset is created using the command-line tool and the web UI. You can apply access controls during or after dataset creation using the API. Call datasets.insert or datasets.update and use the access[] property to apply your access controls.

Creating authorized views

An authorized view allows you to share query results with particular users and groups without giving them read access to the underlying tables. Authorized views can only be created in a dataset that does not contain the tables queried by the view.

When you create an authorized view, you use the view's SQL query to restrict access to only the rows and columns you want the users to see.

Using a view to restrict access to rows and columns is also referred to as assigning row-level permissions to the data.

When you create an authorized view, you:

Create a separate dataset to store the view
Create the view in the new dataset
Assign access controls to the project
Assign the READER access to the dataset containing the view
Authorize the view to access the source dataset

Query quotas

The following limits apply to jobs.query and query-type jobs.insert function calls.

Concurrent rate limit for interactive queries under on-demand pricing: 50 concurrent queries. Queries that return cached results , or queries configured using the dryRun property, do not count against this limit.
Concurrent rate limit for queries that contain user-defined functions (UDFs): 6 concurrent queries, including both interactive and batch queries. Interactive queries that contain UDFs count toward the concurrent rate limit for interactive queries.
Daily query size limit: unlimited by default, but you may specify limits using custom quotas.
Daily update limit:1,000 updates per table per day; applies only to the destination table in a query.
Query execution time limit: 6 hours
Maximum concurrent slots per BigQuery account for on-demand pricing: 2,000
Maximum number of tables referenced per query: 1,000
Maximum number of authorized views per dataset: 1,000
Maximum query length: 256 KB
Maximum response size: 128 MB compressed (unlimited when returning large query results)

Querying data

Querying data

Overview

Running queries

Running queries with CLI:

Running queries with the API:

Running queries with Java client:

Parameterized queries

Running queries with CLI:

Running queries with API:

Running queries with Java client

Running query with Python client:

Running batch queries

Returning large result sets

Paging through list results

Working with arrays

Flattening arrays

Creating Arrays From Subqueries

Arrays and Aggregation

Converting Arrays to Strings

Combining Arrays

Building arrays of arrays

Querying data using wildcard table

Filtering selected tables using _TABLE_SUFFIX

Scanning a range of tables using _TABLE_SUFFIX

Querying all tables in a dataset

Limitations of wildcard tables

Saved and shared queries

Using Views

Limitations of views

Assigning access controls to views

Creating authorized views

Query quotas

results matching ""

No results matching ""