Skip to main content

GA4 BigQuery analysis

For exact attribution across concurrent experiments — or any case where GA4’s Custom Dimension and Audience limits get in your way — pipe GA4 to BigQuery and join experience_impression events to your conversion events on user_pseudo_id. This is the same join GA4 does internally, but in raw SQL with no caps, no (other) rollup, and no event-scope limitations. This page assumes you’ve already enabled the GA4 export. The experience_impression event with exp_variant_string is what we’ll join on.

Enable BigQuery export from GA4

1

Have (or create) a Google Cloud project linked to billing

BigQuery requires a billing-enabled Google Cloud project. The free tier is generous (10 GB storage, 1 TiB queries / month — plenty for most sites’ GA4 raw data), but a billing account must be attached.
2

Link GA4 to BigQuery

In GA4: Admin → Product Links → BigQuery Links → Link. Choose your GCP project, the dataset location, Daily export (free tier) or Streaming (paid, ~$0.05/GB), and which events to include (default: all). Confirm.
3

Wait for the first export

Daily exports populate the previous day’s data in a table named events_YYYYMMDD under your_project.analytics_<propertyID>. The first export typically arrives within 24 hours.
Forward-only. BigQuery export does not backfill GA4 history — you only get raw data from the link date onward. Enable BigQuery export before you need historical raw analysis, or you’ll lose that window.

Cost reality check

  • GA4 Standard (free) supports the daily batch export at no charge up to approximately 1 million events / day. Above that, the daily export is disabled for Standard properties — you’d need streaming (paid GCP billing, ~$0.05/GB) or GA4 360.
  • BigQuery query costs are $5 per TiB scanned after the free 1 TiB / month. Our parameterized template uses _TABLE_SUFFIX partitioning to keep scans small; a single experiment over a 30-day window typically scans under 1 GB.
  • For most ABTestly customers the entire pipeline costs $0/month. If you’re a large site doing >1M events/day, factor in streaming cost.

The parameterized SQL template

Download: ab-test-by-variant.sql The query joins experience_impression events (the variant exposures) to your conversion events on user_pseudo_id, then aggregates per (experiment, variant): unique users, impressions, conversions, revenue, and conversion rate.
-- ab-test-by-variant.sql
-- Per-variant conversion analysis for one ABTestly experiment.
-- Joins `experience_impression` exposures to a conversion event on user_pseudo_id.
--
-- Parameters (replace before running, or set as @-bound parameters):
--   @project        e.g. "my-gcp-project"
--   @dataset        e.g. "analytics_123456789"
--   @experiment_id  the ABTestly per-org display number, e.g. "7"
--   @conversion     the GA4 event name to attribute, e.g. "purchase" or "sign_up"
--   @start_date     inclusive, YYYYMMDD, e.g. "20260501"
--   @end_date       inclusive, YYYYMMDD, e.g. "20260531"

WITH events AS (
  SELECT
    user_pseudo_id,
    event_name,
    event_timestamp,
    (
      SELECT value.string_value
      FROM UNNEST(event_params)
      WHERE key = 'exp_variant_string'
    ) AS exp_variant_string,
    (
      SELECT value.string_value
      FROM UNNEST(event_params)
      WHERE key = 'experiment_id'
    ) AS experiment_id,
    (
      SELECT value.string_value
      FROM UNNEST(event_params)
      WHERE key = 'variant_id'
    ) AS variant_id,
    -- Revenue: prefer `value` parameter (purchase events) else event_value_in_usd
    COALESCE(
      (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'),
      event_value_in_usd,
      0
    ) AS revenue
  FROM `@project.@dataset.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '@start_date' AND '@end_date'
),

-- Users exposed to each variant of THIS experiment.
exposed AS (
  SELECT
    variant_id,
    exp_variant_string,
    user_pseudo_id,
    COUNT(*) AS impressions
  FROM events
  WHERE event_name = 'experience_impression'
    AND experiment_id = '@experiment_id'
  GROUP BY variant_id, exp_variant_string, user_pseudo_id
),

-- Conversions BY exposed users — joined on user, only counted if it
-- happened on or after the user's first exposure.
first_exposure AS (
  SELECT
    user_pseudo_id,
    variant_id,
    MIN(event_timestamp) AS first_exposure_ts
  FROM events
  WHERE event_name = 'experience_impression'
    AND experiment_id = '@experiment_id'
  GROUP BY user_pseudo_id, variant_id
),

conversions AS (
  SELECT
    fe.variant_id,
    e.user_pseudo_id,
    COUNT(*) AS conversion_events,
    SUM(e.revenue) AS revenue
  FROM events e
  JOIN first_exposure fe USING (user_pseudo_id)
  WHERE e.event_name = '@conversion'
    AND e.event_timestamp >= fe.first_exposure_ts
  GROUP BY fe.variant_id, e.user_pseudo_id
)

SELECT
  e.variant_id,
  ANY_VALUE(e.exp_variant_string) AS exp_variant_string,
  COUNT(DISTINCT e.user_pseudo_id) AS users,
  SUM(e.impressions) AS impressions,
  COUNT(DISTINCT c.user_pseudo_id) AS converters,
  COALESCE(SUM(c.conversion_events), 0) AS conversion_events,
  COALESCE(SUM(c.revenue), 0) AS revenue,
  SAFE_DIVIDE(COUNT(DISTINCT c.user_pseudo_id), COUNT(DISTINCT e.user_pseudo_id))
    AS conversion_rate
FROM exposed e
LEFT JOIN conversions c
  ON c.variant_id = e.variant_id
  AND c.user_pseudo_id = e.user_pseudo_id
GROUP BY e.variant_id
ORDER BY e.variant_id;

What it does

  1. UNNEST(event_params) to pull exp_variant_string, experiment_id, variant_id, and revenue out of GA4’s nested schema.
  2. Builds the exposed user set per variant for the chosen experiment.
  3. For each user, finds their first exposure timestamp — conversions before that don’t count (you can’t be influenced by a variant you haven’t seen yet).
  4. Joins conversions on user_pseudo_id (or use user_id if you have User-ID configured) and aggregates per variant.

Running it

In the BigQuery console (or bq query), do a find-and-replace of the @... parameters and run it, or use parameterized queries with bq query --use_legacy_sql=false --parameter=experiment_id:STRING:7 ….

Caveats

  • user_pseudo_id is per-browser-cookie — cross-device users count twice unless you have User-ID configured (swap user_pseudo_id for user_id in the query).
  • Sample-ratio mismatch (SRM) checks should be done in BigQuery too — compare the exposed-user counts per variant against the configured weights. Significant skew means the bucketing isn’t representative.
  • This query attributes first-touch at the user level. Other attribution rules (last-touch, time-decay) require different JOIN semantics — adapt the first_exposure CTE accordingly.

Looker Studio dashboard

For non-technical teammates, the SQL above is wrapped in a Looker Studio template you can connect once and re-use per experiment.
1

Connect Looker Studio to your BigQuery dataset

In Looker Studio: Create → Data source → BigQuery → CUSTOM QUERY. Pick your billing project, paste the parameterized SQL above (with the @experiment_id, @conversion, @start_date, @end_date placeholders promoted to parameters in the query editor — gear icon → Parameters).
2

Build the report

Drop variant_id as a dimension, users, converters, revenue, and conversion_rate as metrics. Add controls for each parameter so viewers can pick the experiment, conversion event, and date range.
3

Share to teammates

Looker Studio reports share via link with view-only access — your teammates get per-variant conversion tables without touching SQL.
We’re working on a publicly shareable Looker Studio template you can copy in one click. In the meantime the recipe above takes about 15 minutes to wire up the first time.