BigQuery Export 모델링
GA4 BigQuery Export의 dataset, table, event_params를 분석 모델로 풀어내는 방법
GA4 UI가 운영용 표면이라면 BigQuery Export는 원시 이벤트를 소유하고 재가공하는 분석 기반입니다.
Google 공식 문서는 각 GA4 property가 BigQuery에 연결되면 analytics_<property_id> dataset과 일별 events_YYYYMMDD table을 만든다고 설명합니다.
Export 구조
| 항목 | 구조 |
|---|---|
| Dataset | analytics_<property_id> |
| Daily table | events_YYYYMMDD |
| Intraday table | events_intraday_YYYYMMDD |
| Event row | 각 row가 하나의 event |
| Repeated fields | event_params, user_properties, items 등 |
Daily export는 전날 데이터를 한 번 내보내고, streaming export는 당일 데이터를 계속 적재합니다. 공식 schema 문서는 late arrival 때문에 daily table이 이벤트 발생일 이후 며칠간 업데이트될 수 있다고 설명합니다.
Standard property export 주의
| 항목 | 기준 |
|---|---|
| Daily export limit | 표준 GA4 property는 daily batch export에 1일 100만 events 제한 |
| Streaming export | 이벤트 수 제한은 없지만 BigQuery streaming 비용 고려 |
| Backfill | Analytics에서 BigQuery로 한 번 export한 데이터를 다시 export하지 않음 |
| Filtering | data stream과 event exclusion으로 export volume 제어 가능 |
Event params flatten 예시
SELECT
event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_ts,
user_pseudo_id,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'plan') AS plan,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'activation_type') AS activation_type,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'trial_day') AS trial_day
FROM `my_project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260510'
AND event_name = 'activation_complete';SaaS activation cohort SQL
WITH signups AS (
SELECT
user_pseudo_id,
DATE(TIMESTAMP_MICROS(MIN(event_timestamp))) AS signup_date
FROM `my_project.analytics_123456789.events_*`
WHERE event_name = 'sign_up'
GROUP BY 1
),
activation AS (
SELECT
user_pseudo_id,
DATE(TIMESTAMP_MICROS(MIN(event_timestamp))) AS activation_date
FROM `my_project.analytics_123456789.events_*`
WHERE event_name = 'activation_complete'
GROUP BY 1
)
SELECT
s.signup_date,
COUNT(*) AS signups,
COUNTIF(a.activation_date <= DATE_ADD(s.signup_date, INTERVAL 7 DAY)) AS activated_d7,
SAFE_DIVIDE(
COUNTIF(a.activation_date <= DATE_ADD(s.signup_date, INTERVAL 7 DAY)),
COUNT(*)
) AS activation_d7_rate
FROM signups s
LEFT JOIN activation a USING (user_pseudo_id)
GROUP BY 1
ORDER BY 1;모델링 레이어
| Layer | 목적 | 예시 |
|---|---|---|
| Raw | GA4 export 원본 | events_YYYYMMDD |
| Staging | params flatten, type casting | stg_ga4_events |
| Intermediate | session/user/account 모델 | int_ga4_user_journey |
| Mart | KPI 보고용 | mart_activation_cohort, mart_revenue_funnel |
dbt 모델 예시
models/staging/stg_ga4_events.sql
WITH source AS (
SELECT *
FROM {{ source('ga4_raw', 'events') }}
),
flattened AS (
SELECT
event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_ts,
user_pseudo_id,
user_id,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source_surface') AS source_surface,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'plan') AS plan,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'activation_type') AS activation_type,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'cta_location') AS cta_location,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'cta_intent') AS cta_intent,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_id,
COALESCE(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'),
CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'value') AS FLOAT64)
) AS event_value,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'currency') AS currency
FROM source
)
SELECT *
FROM flattenedmodels/intermediate/int_ga4_funnel_events.sql
SELECT
user_pseudo_id,
user_id,
DATE(event_ts) AS event_day,
MIN(IF(event_name = 'sign_up', event_ts, NULL)) AS signed_up_at,
MIN(IF(event_name = 'activation_complete', event_ts, NULL)) AS activated_at,
MIN(IF(event_name = 'begin_checkout', event_ts, NULL)) AS checkout_started_at,
MIN(IF(event_name = 'purchase', event_ts, NULL)) AS purchased_at
FROM {{ ref('stg_ga4_events') }}
WHERE event_name IN ('sign_up', 'activation_complete', 'begin_checkout', 'purchase')
GROUP BY 1, 2, 3models/marts/mart_activation_cohort.sql
WITH first_signup AS (
SELECT
user_pseudo_id,
MIN(signed_up_at) AS signed_up_at
FROM {{ ref('int_ga4_funnel_events') }}
WHERE signed_up_at IS NOT NULL
GROUP BY 1
),
first_activation AS (
SELECT
user_pseudo_id,
MIN(activated_at) AS activated_at
FROM {{ ref('int_ga4_funnel_events') }}
WHERE activated_at IS NOT NULL
GROUP BY 1
)
SELECT
DATE(s.signed_up_at) AS signup_date,
COUNT(*) AS signups,
COUNTIF(a.activated_at <= TIMESTAMP_ADD(s.signed_up_at, INTERVAL 7 DAY)) AS activated_d7,
SAFE_DIVIDE(
COUNTIF(a.activated_at <= TIMESTAMP_ADD(s.signed_up_at, INTERVAL 7 DAY)),
COUNT(*)
) AS activation_d7_rate
FROM first_signup s
LEFT JOIN first_activation a USING (user_pseudo_id)
GROUP BY 1models/schema.yml
version: 2
models:
- name: stg_ga4_events
columns:
- name: user_pseudo_id
tests: [not_null]
- name: event_name
tests: [not_null]
- name: event_ts
tests: [not_null]
- name: mart_activation_cohort
columns:
- name: signup_date
tests: [not_null, unique]
- name: activation_d7_rate
tests:
- dbt_utils.accepted_range:
min_value: 0
max_value: 1dbt source는 stable daily table 기준
운영 mart는 events_intraday_YYYYMMDD가 아니라 stable daily export를 기준으로 만드는 편이 안전합니다.
streaming export는 near realtime 확인에 쓰고, user attribution과 완성 데이터가 필요한 KPI는 daily table을 기준으로
재계산하세요.
UI와 BigQuery 차이를 기록하기
| 차이 | 처리 |
|---|---|
| Attribution | GA4 UI의 attribution model과 BigQuery raw event 기준을 구분 |
| Export delay | daily table 완성 시점과 late arrival 반영 |
| Consent mode | cookieless pings와 모델링 데이터 해석 주의 |
| Identity | user_pseudo_id, user_id, account join key 차이 |
| Sampling/thresholding | UI 탐색 제한과 raw export 집계 차이 |