SQL Queries Skill Write correct, performant, readable SQL across all major data warehouse dialects. Dialect-Specific Reference PostgreSQL (including Aurora, RDS, Supabase, Neon) Date/time: -- Current date/time CURRENT_DATE , CURRENT_TIMESTAMP , NOW ( ) -- Date arithmetic date_column + INTERVAL '7 days' date_column - INTERVAL '1 month' -- Truncate to period DATE_TRUNC ( 'month' , created_at ) -- Extract parts EXTRACT ( YEAR FROM created_at ) EXTRACT ( DOW FROM created_at ) -- 0=Sunday -- Format TO_CHAR ( created_at , 'YYYY-MM-DD' ) String functions: -- Concatenation first_name || ' ' || last_name CONCAT ( first_name , ' ' , last_name ) -- Pattern matching column ILIKE '%pattern%' -- case-insensitive column ~ '^regex_pattern$' -- regex -- String manipulation LEFT ( str , n ) , RIGHT ( str , n ) SPLIT_PART ( str , delimiter , position ) REGEXP_REPLACE ( str , pattern , replacement ) Arrays and JSON: -- JSON access data -
'key' -- text data -
'nested'
'key' -- json data
>>'{path,to,key}' -- nested text
- -- Array operations
- ARRAY_AGG
- (
- column
- )
- ANY
- (
- array_column
- )
- array_column @
- >
- ARRAY
- [
- 'value'
- ]
- Performance tips:
- Use
- EXPLAIN ANALYZE
- to profile queries
- Create indexes on frequently filtered/joined columns
- Use
- EXISTS
- over
- IN
- for correlated subqueries
- Partial indexes for common filter conditions
- Use connection pooling for concurrent access
- Snowflake
- Date/time:
- -- Current date/time
- CURRENT_DATE
- (
- )
- ,
- CURRENT_TIMESTAMP
- (
- )
- ,
- SYSDATE
- (
- )
- -- Date arithmetic
- DATEADD
- (
- day
- ,
- 7
- ,
- date_column
- )
- DATEDIFF
- (
- day
- ,
- start_date
- ,
- end_date
- )
- -- Truncate to period
- DATE_TRUNC
- (
- 'month'
- ,
- created_at
- )
- -- Extract parts
- YEAR
- (
- created_at
- )
- ,
- MONTH
- (
- created_at
- )
- ,
- DAY
- (
- created_at
- )
- DAYOFWEEK
- (
- created_at
- )
- -- Format
- TO_CHAR
- (
- created_at
- ,
- 'YYYY-MM-DD'
- )
- String functions:
- -- Case-insensitive by default (depends on collation)
- column
- ILIKE
- '%pattern%'
- REGEXP_LIKE
- (
- column
- ,
- 'pattern'
- )
- -- Parse JSON
- column
- :
- key
- ::string
- -- dot notation for VARIANT
- PARSE_JSON
- (
- '{"key": "value"}'
- )
- GET_PATH
- (
- variant_col
- ,
- 'path.to.key'
- )
- -- Flatten arrays/objects
- SELECT
- f
- .
- value
- FROM
- table
- ,
- LATERAL FLATTEN
- (
- input
- =
- >
- array_col
- )
- f
- Semi-structured data:
- -- VARIANT type access
- data
- :customer:name::STRING
- data
- :items
- [
- 0
- ]
- :price::NUMBER
- -- Flatten nested structures
- SELECT
- t
- .
- id
- ,
- item
- .
- value
- :name::STRING
- as
- item_name
- ,
- item
- .
- value
- :qty::NUMBER
- as
- quantity
- FROM
- my_table t
- ,
- LATERAL FLATTEN
- (
- input
- =
- >
- t
- .
- data
- :items
- )
- item
- Performance tips:
- Use clustering keys on large tables (not traditional indexes)
- Filter on clustering key columns for partition pruning
- Set appropriate warehouse size for query complexity
- Use
- RESULT_SCAN(LAST_QUERY_ID())
- to avoid re-running expensive queries
- Use transient tables for staging/temp data
- BigQuery (Google Cloud)
- Date/time:
- -- Current date/time
- CURRENT_DATE
- (
- )
- ,
- CURRENT_TIMESTAMP
- (
- )
- -- Date arithmetic
- DATE_ADD
- (
- date_column
- ,
- INTERVAL
- 7
- DAY
- )
- DATE_SUB
- (
- date_column
- ,
- INTERVAL
- 1
- MONTH
- )
- DATE_DIFF
- (
- end_date
- ,
- start_date
- ,
- DAY
- )
- TIMESTAMP_DIFF
- (
- end_ts
- ,
- start_ts
- ,
- HOUR
- )
- -- Truncate to period
- DATE_TRUNC
- (
- created_at
- ,
- MONTH
- )
- TIMESTAMP_TRUNC
- (
- created_at
- ,
- HOUR
- )
- -- Extract parts
- EXTRACT
- (
- YEAR
- FROM
- created_at
- )
- EXTRACT
- (
- DAYOFWEEK
- FROM
- created_at
- )
- -- 1=Sunday
- -- Format
- FORMAT_DATE
- (
- '%Y-%m-%d'
- ,
- date_column
- )
- FORMAT_TIMESTAMP
- (
- '%Y-%m-%d %H:%M:%S'
- ,
- ts_column
- )
- String functions:
- -- No ILIKE, use LOWER()
- LOWER
- (
- column
- )
- LIKE
- '%pattern%'
- REGEXP_CONTAINS
- (
- column
- ,
- r
- 'pattern'
- )
- REGEXP_EXTRACT
- (
- column
- ,
- r
- 'pattern'
- )
- -- String manipulation
- SPLIT
- (
- str
- ,
- delimiter
- )
- -- returns ARRAY
- ARRAY_TO_STRING
- (
- array
- ,
- delimiter
- )
- Arrays and structs:
- -- Array operations
- ARRAY_AGG
- (
- column
- )
- UNNEST
- (
- array_column
- )
- ARRAY_LENGTH
- (
- array_column
- )
- value
- IN
- UNNEST
- (
- array_column
- )
- -- Struct access
- struct_column
- .
- field_name
- Performance tips:
- Always filter on partition columns (usually date) to reduce bytes scanned
- Use clustering for frequently filtered columns within partitions
- Use
- APPROX_COUNT_DISTINCT()
- for large-scale cardinality estimates
- Avoid
- SELECT *
- -- billing is per-byte scanned
- Use
- DECLARE
- and
- SET
- for parameterized scripts
- Preview query cost with dry run before executing large queries
- Redshift (Amazon)
- Date/time:
- -- Current date/time
- CURRENT_DATE
- ,
- GETDATE
- (
- )
- ,
- SYSDATE
- -- Date arithmetic
- DATEADD
- (
- day
- ,
- 7
- ,
- date_column
- )
- DATEDIFF
- (
- day
- ,
- start_date
- ,
- end_date
- )
- -- Truncate to period
- DATE_TRUNC
- (
- 'month'
- ,
- created_at
- )
- -- Extract parts
- EXTRACT
- (
- YEAR
- FROM
- created_at
- )
- DATE_PART
- (
- 'dow'
- ,
- created_at
- )
- String functions:
- -- Case-insensitive
- column
- ILIKE
- '%pattern%'
- REGEXP_INSTR
- (
- column
- ,
- 'pattern'
- )
- >
- 0
- -- String manipulation
- SPLIT_PART
- (
- str
- ,
- delimiter
- ,
- position
- )
- LISTAGG
- (
- column
- ,
- ', '
- )
- WITHIN
- GROUP
- (
- ORDER
- BY
- column
- )
- Performance tips:
- Design distribution keys for collocated joins (DISTKEY)
- Use sort keys for frequently filtered columns (SORTKEY)
- Use
- EXPLAIN
- to check query plan
- Avoid cross-node data movement (watch for DS_BCAST and DS_DIST)
- ANALYZE
- and
- VACUUM
- regularly
- Use late-binding views for schema flexibility
- Databricks SQL
- Date/time:
- -- Current date/time
- CURRENT_DATE
- (
- )
- ,
- CURRENT_TIMESTAMP
- (
- )
- -- Date arithmetic
- DATE_ADD
- (
- date_column
- ,
- 7
- )
- DATEDIFF
- (
- end_date
- ,
- start_date
- )
- ADD_MONTHS
- (
- date_column
- ,
- 1
- )
- -- Truncate to period
- DATE_TRUNC
- (
- 'MONTH'
- ,
- created_at
- )
- TRUNC
- (
- date_column
- ,
- 'MM'
- )
- -- Extract parts
- YEAR
- (
- created_at
- )
- ,
- MONTH
- (
- created_at
- )
- DAYOFWEEK
- (
- created_at
- )
- Delta Lake features:
- -- Time travel
- SELECT
- *
- FROM
- my_table
- TIMESTAMP
- AS
- OF
- '2024-01-15'
- SELECT
- *
- FROM
- my_table VERSION
- AS
- OF
- 42
- -- Describe history
- DESCRIBE
- HISTORY my_table
- -- Merge (upsert)
- MERGE
- INTO
- target
- USING
- source
- ON
- target
- .
- id
- =
- source
- .
- id
- WHEN
- MATCHED
- THEN
- UPDATE
- SET
- *
- WHEN
- NOT
- MATCHED
- THEN
- INSERT
- *
- Performance tips:
- Use Delta Lake's
- OPTIMIZE
- and
- ZORDER
- for query performance
- Leverage Photon engine for compute-intensive queries
- Use
- CACHE TABLE
- for frequently accessed datasets
- Partition by low-cardinality date columns
- Common SQL Patterns
- Window Functions
- -- Ranking
- ROW_NUMBER
- (
- )
- OVER
- (
- PARTITION
- BY
- user_id
- ORDER
- BY
- created_at
- DESC
- )
- RANK
- (
- )
- OVER
- (
- PARTITION
- BY
- category
- ORDER
- BY
- revenue
- DESC
- )
- DENSE_RANK
- (
- )
- OVER
- (
- ORDER
- BY
- score
- DESC
- )
- -- Running totals / moving averages
- SUM
- (
- revenue
- )
- OVER
- (
- ORDER
- BY
- date_col
- ROWS
- BETWEEN
- UNBOUNDED
- PRECEDING
- AND
- CURRENT
- ROW
- )
- as
- running_total
- AVG
- (
- revenue
- )
- OVER
- (
- ORDER
- BY
- date_col
- ROWS
- BETWEEN
- 6
- PRECEDING
- AND
- CURRENT
- ROW
- )
- as
- moving_avg_7d
- -- Lag / Lead
- LAG
- (
- value
- ,
- 1
- )
- OVER
- (
- PARTITION
- BY
- entity
- ORDER
- BY
- date_col
- )
- as
- prev_value
- LEAD
- (
- value
- ,
- 1
- )
- OVER
- (
- PARTITION
- BY
- entity
- ORDER
- BY
- date_col
- )
- as
- next_value
- -- First / Last value
- FIRST_VALUE
- (
- status
- )
- OVER
- (
- PARTITION
- BY
- user_id
- ORDER
- BY
- created_at
- ROWS
- BETWEEN
- UNBOUNDED
- PRECEDING
- AND
- UNBOUNDED
- FOLLOWING
- )
- LAST_VALUE
- (
- status
- )
- OVER
- (
- PARTITION
- BY
- user_id
- ORDER
- BY
- created_at
- ROWS
- BETWEEN
- UNBOUNDED
- PRECEDING
- AND
- UNBOUNDED
- FOLLOWING
- )
- -- Percent of total
- revenue
- /
- SUM
- (
- revenue
- )
- OVER
- (
- )
- as
- pct_of_total
- revenue
- /
- SUM
- (
- revenue
- )
- OVER
- (
- PARTITION
- BY
- category
- )
- as
- pct_of_category
- CTEs for Readability
- WITH
- -- Step 1: Define the base population
- base_users
- AS
- (
- SELECT
- user_id
- ,
- created_at
- ,
- plan_type
- FROM
- users
- WHERE
- created_at
- >=
- DATE
- '2024-01-01'
- AND
- status
- =
- 'active'
- )
- ,
- -- Step 2: Calculate user-level metrics
- user_metrics
- AS
- (
- SELECT
- u
- .
- user_id
- ,
- u
- .
- plan_type
- ,
- COUNT
- (
- DISTINCT
- e
- .
- session_id
- )
- as
- session_count
- ,
- SUM
- (
- e
- .
- revenue
- )
- as
- total_revenue
- FROM
- base_users u
- LEFT
- JOIN
- events e
- ON
- u
- .
- user_id
- =
- e
- .
- user_id
- GROUP
- BY
- u
- .
- user_id
- ,
- u
- .
- plan_type
- )
- ,
- -- Step 3: Aggregate to summary level
- summary
- AS
- (
- SELECT
- plan_type
- ,
- COUNT
- (
- *
- )
- as
- user_count
- ,
- AVG
- (
- session_count
- )
- as
- avg_sessions
- ,
- SUM
- (
- total_revenue
- )
- as
- total_revenue
- FROM
- user_metrics
- GROUP
- BY
- plan_type
- )
- SELECT
- *
- FROM
- summary
- ORDER
- BY
- total_revenue
- DESC
- ;
- Cohort Retention
- WITH
- cohorts
- AS
- (
- SELECT
- user_id
- ,
- DATE_TRUNC
- (
- 'month'
- ,
- first_activity_date
- )
- as
- cohort_month
- FROM
- users
- )
- ,
- activity
- AS
- (
- SELECT
- user_id
- ,
- DATE_TRUNC
- (
- 'month'
- ,
- activity_date
- )
- as
- activity_month
- FROM
- user_activity
- )
- SELECT
- c
- .
- cohort_month
- ,
- COUNT
- (
- DISTINCT
- c
- .
- user_id
- )
- as
- cohort_size
- ,
- COUNT
- (
- DISTINCT
- CASE
- WHEN
- a
- .
- activity_month
- =
- c
- .
- cohort_month
- THEN
- a
- .
- user_id
- END
- )
- as
- month_0
- ,
- COUNT
- (
- DISTINCT
- CASE
- WHEN
- a
- .
- activity_month
- =
- c
- .
- cohort_month
- +
- INTERVAL
- '1 month'
- THEN
- a
- .
- user_id
- END
- )
- as
- month_1
- ,
- COUNT
- (
- DISTINCT
- CASE
- WHEN
- a
- .
- activity_month
- =
- c
- .
- cohort_month
- +
- INTERVAL
- '3 months'
- THEN
- a
- .
- user_id
- END
- )
- as
- month_3
- FROM
- cohorts c
- LEFT
- JOIN
- activity a
- ON
- c
- .
- user_id
- =
- a
- .
- user_id
- GROUP
- BY
- c
- .
- cohort_month
- ORDER
- BY
- c
- .
- cohort_month
- ;
- Funnel Analysis
- WITH
- funnel
- AS
- (
- SELECT
- user_id
- ,
- MAX
- (
- CASE
- WHEN
- event
- =
- 'page_view'
- THEN
- 1
- ELSE
- 0
- END
- )
- as
- step_1_view
- ,
- MAX
- (
- CASE
- WHEN
- event
- =
- 'signup_start'
- THEN
- 1
- ELSE
- 0
- END
- )
- as
- step_2_start
- ,
- MAX
- (
- CASE
- WHEN
- event
- =
- 'signup_complete'
- THEN
- 1
- ELSE
- 0
- END
- )
- as
- step_3_complete
- ,
- MAX
- (
- CASE
- WHEN
- event
- =
- 'first_purchase'
- THEN
- 1
- ELSE
- 0
- END
- )
- as
- step_4_purchase
- FROM
- events
- WHERE
- event_date
- >=
- CURRENT_DATE
- -
- INTERVAL
- '30 days'
- GROUP
- BY
- user_id
- )
- SELECT
- COUNT
- (
- *
- )
- as
- total_users
- ,
- SUM
- (
- step_1_view
- )
- as
- viewed
- ,
- SUM
- (
- step_2_start
- )
- as
- started_signup
- ,
- SUM
- (
- step_3_complete
- )
- as
- completed_signup
- ,
- SUM
- (
- step_4_purchase
- )
- as
- purchased
- ,
- ROUND
- (
- 100.0
- *
- SUM
- (
- step_2_start
- )
- /
- NULLIF
- (
- SUM
- (
- step_1_view
- )
- ,
- 0
- )
- ,
- 1
- )
- as
- view_to_start_pct
- ,
- ROUND
- (
- 100.0
- *
- SUM
- (
- step_3_complete
- )
- /
- NULLIF
- (
- SUM
- (
- step_2_start
- )
- ,
- 0
- )
- ,
- 1
- )
- as
- start_to_complete_pct
- ,
- ROUND
- (
- 100.0
- *
- SUM
- (
- step_4_purchase
- )
- /
- NULLIF
- (
- SUM
- (
- step_3_complete
- )
- ,
- 0
- )
- ,
- 1
- )
- as
- complete_to_purchase_pct
- FROM
- funnel
- ;
- Deduplication
- -- Keep the most recent record per key
- WITH
- ranked
- AS
- (
- SELECT
- *
- ,
- ROW_NUMBER
- (
- )
- OVER
- (
- PARTITION
- BY
- entity_id
- ORDER
- BY
- updated_at
- DESC
- )
- as
- rn
- FROM
- source_table
- )
- SELECT
- *
- FROM
- ranked
- WHERE
- rn
- =
- 1
- ;
- Error Handling and Debugging
- When a query fails:
- Syntax errors
-
- Check for dialect-specific syntax (e.g.,
- ILIKE
- not available in BigQuery,
- SAFE_DIVIDE
- only in BigQuery)
- Column not found
-
- Verify column names against schema -- check for typos, case sensitivity (PostgreSQL is case-sensitive for quoted identifiers)
- Type mismatches
-
- Cast explicitly when comparing different types (
- CAST(col AS DATE)
- ,
- col::DATE
- )
- Division by zero
-
- Use
- NULLIF(denominator, 0)
- or dialect-specific safe division
- Ambiguous columns
-
- Always qualify column names with table alias in JOINs
- Group by errors
- All non-aggregated columns must be in GROUP BY (except in BigQuery which allows grouping by alias)