Data Pipeline¶
OpenG2G simulations consume pre-processed GPU benchmark data. This page describes how raw benchmark measurements flow through the mlenergy-data toolkit into simulation-ready artifacts.
Overview¶
Two Python packages work together:
┌──────────────────────────┐ ┌──────────────────────────────────┐
│ mlenergy-data │ │ OpenG2G │
│ │ │ │
│ Benchmark data toolkit │──────>│ Grid-datacenter co-simulation │
│ │ │ │
│ "How do LLMs behave │ │ "What happens to the │
│ at different batch │ │ distribution feeder when │
│ sizes on real GPUs?" │ │ you run these workloads?" │
└──────────────────────────┘ └──────────────────────────────────┘
Data supply side Simulation & control side
- mlenergy-data: Loads, filters, and fits models to real GPU benchmark data (power, latency, throughput vs. batch size) from the ML.ENERGY Benchmark (v3 dataset).
- OpenG2G: Multi-rate time-domain simulation of an LLM workload datacenter connected to an IEEE 13-bus distribution feeder, with OFO batch-size control.
The mlenergy-data Toolkit¶
Three capabilities used by OpenG2G:
Typed data loading¶
runs = LLMRuns.from_hf() # load all runs
runs = runs.task("lm-arena-chat").gpu("H100").batch(min=8) # fluent filtering
Each LLMRun is a typed record with 40 fields: power, latency, throughput, model metadata, GPU config, etc.
Logistic curve fitting¶
Four-parameter logistic: y = b0 + L * sigmoid(k * (x - x0)) where x = log2(batch).
These curves model how power, latency, and throughput vary with batch size (Section II-C of the G2G paper, Eqs. 1-3). See Concepts: Batch Size as a Grid-Aware Control for the characteristic S-curve shape.
LogisticModel.fit(x, y): Grid search + least squaresLogisticModel.eval(batch): Evaluate at any batch sizeLogisticModel.deriv_wrt_x(x): Gradient for OFO controller (G2G paper Eq. 18)
Inter-token latency (ITL) mixture model¶
Two-component lognormal mixture captures bimodal ITL distributions (steady decode vs. scheduling stall):
Probability
|
| **
| ****
| ***** *
| ****** ***
| ******* *****
|********************
└───────────────────────── ITL (ms)
"steady" "stall"
(decode) (scheduling)
ITLMixtureModel.fit(samples): EM algorithmITLMixtureModel.sample_avg(n_replicas, rng): Draw average latency across replicas
Build-Time Pipeline¶
Raw GPU benchmarks are processed into simulation-ready CSV artifacts by data/offline/build_mlenergy_data.py:
ML.ENERGY Benchmark DB mlenergy-data OpenG2G simulation
(HF Hub or local disk) toolkit inputs
┌────────────────────┐
│ results.json ×1000s│ LLMRuns.from_directory()
│ (power, latency, │────────────────────────────>┐
│ throughput, ITL │ Load, filter, validate │
│ per model × batch │ │
└────────────────────┘ │
v
┌────────────────────┐ ┌───────────────────────────────────┐
│ models.json │ │ build_mlenergy_data.py │
│ │────────>│ │
│ 5 models: │ │ For each model × batch size: │
│ 8B, 70B, 405B, │ │ 1. Extract power timelines │
│ 30B-A3B, 235B-A22B│ │ 2. Resample to median-duration │
│ │ │ 3. Fit LogisticModel (power, │
└────────────────────┘ │ latency, throughput vs batch) │
│ 4. Fit ITLMixtureModel (latency │
│ distribution per batch) │
└──────────┬────────────────────────┘
│
v
┌──────────────────────────┐
│ data/generated/ │
│ │
│ traces/*.csv │ <── per-GPU power time series
│ logistic_fits.csv │ <── 4-param curves (L, x0, k, b0)
│ latency_fits.csv │ <── 2-component lognormal mixture
│ synthetic_training.csv │ <── synthetic training overlay
└──────────────────────────┘
Running the build¶
python data/offline/build_mlenergy_data.py \
--config data/offline/models.json \
--out-dir data/generated
python data/offline/generate_training_trace.py \
--out-csv data/generated/synthetic_training_trace.csv --seed 2
The mlenergy-data toolkit automatically downloads benchmark data from the ML.ENERGY Benchmark v3 dataset on first run. This is a gated dataset -- you must request access on Hugging Face before running the build. To use a local copy instead, pass --mlenergy-data-dir /path/to/compiled/data.
The config file (data/offline/models.json) maps benchmark model IDs to simulation labels.
Runtime Integration¶
At simulation time, the generated CSV artifacts are consumed at two points:
┌─────────── RUN TIME (every simulation) ─────────────────────────────┐
│ │
│ OfflineDatacenter reads: │
│ traces/*.csv ──> PowerTraceStore (periodic power templates) │
│ latency_fits.csv ──> ITLMixtureModel.sample_avg() per step │
│ │
│ OFO Controller reads: │
│ logistic_fits.csv ──> LogisticModel.eval() / .deriv_wrt_x() │
│ called every control step for gradients │
│ │
└─────────────────────────────────────────────────────────────────────┘
-
OfflineDatacenter: Loads power traces viaPowerTraceStore.load(manifest), which reads a manifest CSV and builds periodic per-GPU templates. At each step, the datacenter indexes into these templates and delegates to aPowerAugmenterto produce three-phase power. Latency fits are loaded asITLMixtureModelinstances and sampled at each control interval. -
OFO Controller: Loads logistic fits as
LogisticModelinstances (one per metric per model). At each control step, it callseval()andderiv_wrt_x()to compute the gradient of the Lagrangian (G2G paper Eq. 18).
Passing data to simulations¶
python examples/offline/run_baseline.py --mode no-tap \
--data-dir data/generated \
--training-trace data/generated/synthetic_training_trace.csv
python examples/offline/run_ofo.py \
--data-dir data/generated \
--training-trace data/generated/synthetic_training_trace.csv
--data-dir and --training-trace are required for all simulation drivers.