TL;DR

EMCM treats a website's per-visit emissions as a probability distribution, not a single number. It measures device energy from real browser CPU-time (not a bytes proxy), weights emissions by the live grid at the time and place of each real visit, calibrates thin-data sites toward a corpus class prior using empirical-Bayes shrinkage, and reports a median with a 90% credible interval — every figure labelled measured, modelled, or population-prior. Status: v3.0, published and live; not yet externally peer-reviewed.

Two version axes (do not conflate). The published model version is 3.0.0. The per-site methodology_version field (v2 / legacy) marks the stored measurement-pipeline era; EMCM v3 calibrates v2-pipeline measurements at read time, so stored rows remain v2 while the model is v3.

Abstract

The de-facto standard for website carbon estimation — the Sustainable Web Design Model (SWDM v3/v4) as implemented in the Green Web Foundation's CO2.js library — computes a single deterministic point estimate of grams CO₂e per page view, driven almost entirely by data transfer (page weight in GB) as a proxy for resource use, multiplied by per-segment energy intensities and an annual-average grid carbon factor. This has three structural limitations no choice of coefficient can repair: (1) bytes cannot distinguish a 1 MB image page from a 1 MB JavaScript-thrashing page, so real device-side CPU energy — the dominant operational segment — is invisible; (2) the grid signal is an annual average, ignoring when and where real users loaded the page; and (3) the output carries no uncertainty, despite resting on top-down coefficients with very wide error bounds.

The EcoPigs Measured-Carbon Model (EMCM) treats per-visit emissions as a random variable. Its contributions are: (i) instrumented device energy from Chrome DevTools Protocol (CDP) CPU-time, replacing the byte proxy for the dominant segment; (ii) a real-visit time-weighted grid that integrates carbon intensity over the empirical joint distribution of (when, where) users loaded the page; (iii) a distributional output — a posterior median with a 90% credible interval, decomposed by segment and tagged by epistemic tier; and, as the core contribution, (iv) an empirical-Bayes normal-normal shrinkage estimator on log-emissions that pulls data-poor sites toward a corpus-derived class prior and honestly widens their intervals.

1. Motivation: Limits of SWD / CO2.js

The incumbent standard is the Sustainable Web Design Model, implemented as the default model in CO2.js (SWDM v4 since CO2.js v0.18) and surfaced publicly through the Website Carbon Calculator. EMCM does not, and cannot, claim novelty over SWDM itself — SWDM is fully documented and open-source. EMCM earns novelty only by departing from it.

SWDM v4 computes per page view as a deterministic top-down extrapolation, where each term is (energy intensity kWh/GB) × (bytes in GB) × (grid carbon intensity). The v4 operational constants (kWh/GB) are data centres 0.055, networks 0.059, user devices 0.080; embodied 0.012 / 0.013 / 0.081. These derive from a top-down division of roughly 1021 TWh total internet energy by 5.29 ZB of data transfer. The grid factor defaults to 494 gCO₂e/kWh, a global annual average. There is no marginal, hourly, or time-of-use grid signal.

The three structural limitations EMCM must beat: bytes are the sole driver (real device CPU/GPU energy is invisible); the grid factor is an annual average (no temporal or real hosting-region granularity); and the output is a deterministic point estimate (no interval, despite large documented coefficient uncertainty). The defensible novelty axes are therefore uncertainty-aware distributional output and non-byte-proxy energy modelling with a real-visit temporal grid — which SWDM v4 categorically lacks. "Better coefficients" or "a nicer report" would be incremental, not novel.

2. The EMCM Model

Per-visit emissions for a site are a random variable, summed over four segments, each energy(kWh) × grid-intensity(gCO₂e/kWh): device, network, server, embodied. Each is weighted over the real-visit distribution (visit weight, location and time).

Device segment (the novel axis). Instead of bytes × 0.080 kWh/GB, the device operational term is built from instrumented main-thread work: script + layout + recalc-style durations × the device CPU power profile. CPU is the measured term — the mechanism that distinguishes a JS-thrash page from an image page, the thing bytes cannot do. GPU is a disclosed conservative ceiling (a small fraction of layout/recalc time × a deliberately high GPU-power ceiling; a sub-5% term, so even a 2× error moves total device energy under 3%). Screen is calculated from page-load duration × screen power. An honesty constraint: headless-CDP scans yield the measured value; in-page RUM JavaScript can collect CPU-time but not joules, so RUM device energy is a labelled proxy (σ=0.40) versus the measured CDP value (σ=0.15).

Network remains a bytes-proxy (0.059 kWh/GB × a multi-grid global average); it is the most contested coefficient in the field (σ=0.40). Server energy = server power (2.1 W/vCPU, Cloud Carbon Footprint AWS average at 50% utilisation) × TTFB, at the hosting-country grid; it is a labelled estimate at roughly 0.1% of the total, so it does not affect the grade. Embodied uses the standard embodied intensities at the global grid (see §6).

The output is a posterior distribution summarised as median + 90% credible interval, decomposed by segment, every figure tagged by epistemic tier.

3. Empirical-Bayes Calibration (the core contribution)

This is the defensible scientific core of EMCM and the mechanism SWD/CO2.js categorically lack: empirical-Bayes normal-normal shrinkage on log-emissions.

Why log space. Emissions are right-skewed. The EcoPigs corpus shows Pearson r = 0.84 between page weight and emissions, with median 0.143 g versus mean 0.391 g — consistent with log-normality.

The class prior. The prior for a site is its class — the corpus posterior for sites sharing (sector, stack/CMS, hosting-region) — fitted by method-of-moments over the era-2-only measured corpus (the 218 measured sites; legacy rows are era-contaminated and excluded). A site's own measurement is the mean of log per-visit emissions over its RUM visits (or the single scan if no RUM).

The load-bearing honesty fix. A site's measurement variance has two terms. The sampling variance of the visit mean shrinks with visit count — rich RUM means the weight moves toward trusting the site's own measurement. But the conversion-uncertainty floor — turning measured CPU-time and bytes into grams through energy coefficients and grid intensity — is systematic and does not shrink with visit count. Critically it is share-weighted: because embodied is around 77% of a typical total and is a modelled bytes-proxy, the floor is dominated by embodied's uncertainty. So a site with thousands of real visits still carries a roughly ±30–40% interval, because three-quarters of the number is a modelled proxy — and the interval says so. The measured operational figure is never allowed to lend its tightness to the proxied embodied majority.

Thin-corpus handling. The corpus is thin (19 users, 218 measured sites), so class precision is pooled up a hierarchy — (sector × stack × region) → (stack × region) → (region) → global — using the most specific level with enough sites. Because 96% of sites share one account, between-site variance would be falsely tight, so the class variance is computed on user-demeaned residuals to absorb a user random effect. The net effect: rich-RUM sites get a tight interval around their own measurement; thin or no-data sites get an honestly wide interval centred on the class prior. A deterministic point estimator cannot know when it is guessing. EMCM does.

4. Real-Visit Time-Weighted Grid

EMCM reports an attributional grid term as the headline and a marginal term as a separate, optional consequential lens. For the headline, device-segment grid intensity is sampled at the actual location and timestamp of each real visit captured by the RUM pixel — the hourly average carbon intensity at the visitor's grid zone and hour (Electricity Maps / national TSO half-hourly, e.g. the UK Carbon Intensity API). This is a true integration of the grid term over the empirical joint distribution of (when, where) real users loaded the page, not an annual constant.

A strict fallback order applies: per-visit hourly average (measured tier) → the site's real-visit-weighted hourly profile → hosting-country annual average (estimated tier) → the global average (473, Ember 2024). The marginal lens (MOER at visit time) answers "would shifting or cutting this load change emissions?" and is reported as a separate consequential figure, explicitly not the attributional footprint, because GHG Protocol Scope 2 favours average / location-based accounting for attributional reporting.

5. Distributional Output & Epistemic Tiering

EMCM converts the implementation's ±1σ range into a genuine posterior via Monte Carlo: each segment is a product of log-normals with its uncertainty scale; K samples are drawn with the device grid factor drawn from the real-visit hourly distribution so that temporal grid spread enters the interval, not just coefficient spread; the empirical-Bayes shrinkage then acts on the sample mean in log space. We report the median (P50) and the 90% credible interval [P5, P95], plus the per-segment decomposition.

Every figure carries a tier — measured (CDP CPU / live grid / real visits), modelled (per-GB coefficients, TTFB-proxy server), or population-prior (shrunk toward the corpus when site data is thin) — a confidence level from the documented uncertainty, and a one-line source. The interval also explicitly flags structural / model uncertainty (SWD moved roughly two-thirds from v3 to v4 — a model-form change) as a non-quantified caveat alongside the quantified parameter interval, so the interval is never represented as falsely complete.

6. Embodied Amortisation

What v3.0 does. Embodied (manufacturing / lifecycle) energy — data centre 0.012, network 0.013, device 0.081 kWh/GB (Malmodin 2023) — is allocated on the byte proxy and converted at the global grid intensity, consistent with SWDM v4. Embodied is modelled, not measured: reported as its own decomposition row, tier-tagged modelled-proxy (σ = 0.30), and because it is around 77% of a typical total, its uncertainty dominates the headline credible interval (the band is share-weighted so the rigorous operational figure never lends its tightness to the proxied embodied majority).

Scheduled — measured embodied. Device embodied energy will be apportioned to a visit by its share of the device's amortised active lifetime — the Green Software Foundation SCI form (ISO/IEC 21031:2024), with active time taken from real RUM session active-time rather than an assumed dwell. This is the fourth instance of EMCM's core move (bytes→CPU, annual→live grid, assumed-cache→measured visit-mix, assumed-dwell→measured active-time): on badge sites even the embodied footprint becomes measured, not assumed. It is deferred deliberately — the amortisation formula is the standard SCI/ISO one (to be cited, not claimed) and its denominator (device lifetime hours) is not yet a cited constant.

Shipped in v3.1 — the Estimated System Footprint (interim disclosure). Ahead of measured active-time, v3.1 surfaces a separate, bounded Estimated System Footprint alongside (and beneath) the score: the fuller boundary — operational + transfer + server + device-embodied, with terms under 1% excluded per SCI / ISO 21031 — including a time-share allocation of the visitor's device-manufacturing carbon. It is reported with its 90% band and a low-confidence tier (modelled-assumed-active-time) because session active-time is currently assumed, not measured; it runs roughly 10× the score. It is held strictly separate from the grade: a grade must respond to developer action, whereas manufacturing-by-session-time cannot be moved by the developer (and grading on it would perversely reward bounce rate) — the same logic that separates Scope 3 from Scope 1. It is disclosed as an allocation of hardware that exists regardless of the visit, explicitly not additive to a corporate inventory: fuller-boundary context, not a "truer" number than the score. When real active-session measurement lands, its tier flips to measured-active-time and the band tightens.

7. Uncertainty Propagation

The implemented propagation combines, per segment, the coefficient uncertainty and the grid uncertainty in quadrature (root-sum-square, assuming independent errors), then RSS-combines segments. Grid uncertainty is 0.05 if the grid value is live, else 0.15; device is 0.15 if measured via CDP, else 0.40. The documented relative uncertainties reflect literature spread: network 0.40 (capacity- vs volume-based allocation), data centre 0.35 (IEA vs Koomey/Masanet critiques), embodied 0.30. EMCM promotes this ±1σ range to the Monte-Carlo posterior of §5; the RSS interval remains the closed-form sanity check and the fallback when no RUM samples exist.

Independence caveat (honest). The RSS assumes independent errors. Grid errors are in fact correlated across segments that share a grid zone; the Monte-Carlo formulation can draw a shared grid factor per zone to capture this, which the closed-form RSS cannot. This correlation correction is a refinement target.

8. Novelty vs Prior Art

Stated as an honest novelty matrix, the defensible departures from SWDM v4 / CO2.js are: (1) distributional / uncertainty-aware output (the clearest novelty — SWDM has no uncertainty quantification); (2) empirical-Bayes shrinkage to a corpus class prior (the core — no prior, no shrinkage, no "knows when it's guessing" mechanism exists in SWDM); (3) a real-visit hourly time-weighted grid; (4) measured (non-byte-proxy) device energy — novel in application, incremental in concept; and (5) per-figure epistemic tiering.

Everything else is shared with, or incremental over, the prior art, and is labelled as such: better coefficients are incremental, not novel; the marginal grid is offered as decision-support, not claimed as novelty; embodied amortisation (v3.1) is the standard SCI formula, cited not claimed; and the grade thresholds deliberately match the Website Carbon Calculator for cross-tool comparability — we differentiate on measurement accuracy, not grading.

9. Limitations & Open Questions

Thin corpus. 19 users, 218 measured sites, 96% under one account. Cluster-robust user-demeaning mitigates but does not eliminate single-tenant bias; class priors above the global level are presently fragile.
Era contamination. Legacy emissions are excluded; only the 218 era-2 measured sites calibrate the prior. Correct, but small.
RUM ≠ headless measurement. RUM gives CPU-time proxies, not joules; headless CDP gives the measured value. The device tier is honest about which it had.
GPU term is a disclosed conservative ceiling (a sub-5% term).
Embodied amortisation denominator (device lifetime hours) is not yet a cited constant — precisely why measured embodied is scoped to v3.1.
Some device-power values mix peer-reviewed sources with reproducible community measurements; mobile GPU is essentially an unmeasured inference (the weakest value).
Structural uncertainty is unquantified. A v3-to-v4-style model-form change is flagged but cannot be bounded by the parameter interval.
Marginal vs attributional must be kept separate to avoid double-counting.

10. Reproducibility: the /trace Endpoint

Citability requires that any published figure be reconstructable from inputs. EMCM ships a /trace endpoint returning, for a given site or scan, the complete provenance: the inputs (bytes, CDP durations, resolved device type, hosting country, the RUM visit set or single-scan flag); the exact coefficients used, each with its source string and citation flag; which grid fallback tier fired per segment, with timestamps; the calibration state (class cell, backoff level, prior parameters, the site mean and variance, the resulting weight, and the posterior); and the output (median, [P5, P95], per-segment decomposition with tiers, plus the closed-form RSS range as a cross-check).

A figure is publishable only if its /trace reproduces it deterministically. This makes every EMCM number independently auditable — the property SWDM point estimates technically have but never expose at this granularity. The machine-readable methodology and per-score trace are live at api.ecopigs.co.uk/api/v4/methodology.

For a plain-language overview, see Measuring the Real Carbon of the Web. EMCM v3.0 is published and live (2026-06-15); not yet externally peer-reviewed.