SQL in BigQuery 1

Wed, 14 Apr 2021 16:17:51 +0800

1. How to handle duplicate column names on left join

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result set. All matching column names are omitted from the output.

SELECT 
	t1.* EXCEPT (name),
	t2.* EXCEPT (name,account_id),
	t1.name AS deployment_name,
	t2.name AS account_name
FROM
	table1 t1
LEFT JOIN
	table2 t2
ON t1.name = t2.name

2. Extract date from timestamp

Returns a value that corresponds to the specified part from a supplied timestamp_expression. This function supports an optional timezone parameter. See Time zone definitions for information on how to specify a time zone.

EXTRACT(part FROM timestamp_expression [AT TIME ZONE timezone])

SELECT
	EXTRACT(DATE FROM started) AS deployment_start_date
FROM
	table

3. How to calculate duration between two timestamp

Returns the number of whole specified date_part intervals between two TIMESTAMP objects (timestamp_expression_a - timestamp_expression_b). If the first TIMESTAMP is earlier than the second one, the output is negative. Throws an error if the computation overflows the result type, such as if the difference in microseconds between the two TIMESTAMP objects would overflow an INT64 value.

TIMESTAMP_DIFF(timestamp_expression_a, timestamp_expression_b, date_part)

SELECT
	TIMESTAMP_DIFF( t1.stopped, t1.started, SECOND ) AS deployment_time
FROM
	table t1

4. Regular expression and multi-condition in BQ

Short-circuiting in conditional expressions can be exploited for error handling or performance tuning.

Note: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.

SELECT
	CASE
  	WHEN REGEXP_CONTAINS(t1.app_image,r"dkr.ecr.") THEN "central ECR"
    WHEN REGEXP_CONTAINS(t1.app_image,r"registry.cowbell.realestate.com.au") THEN "cowbell"
    ELSE "others"
	END
	AS registry
FROM
	table t1

5. How to get data from last one year dynamically

Truncates a timestamp to the granularity of date_part.

TIMESTAMP_TRUNC(timestamp_expression, date_part[, timezone])

SELECT
	t1.*
FROM
	table t1
WHERE
	s1.started
  	BETWEEN
			TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 365 * 24 HOUR),DAY)
		AND
			TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)

Reference

bigquery-how-to-automatically-handle-duplicate-column-names-on-left-join

timestamp_function_extract

timestamp_trunc