Data Pipeline¶
OpenG2G ships with built-in support for trace-replay simulations based on real GPU benchmark data. This page describes how raw benchmark measurements are compiled into artifacts that plug into simulation, and how those artifacts are consumed at runtime.
LLM workloads
The data pipeline is focused on LLM workloads (inference from the ML.ENERGY Benchmark results, and training via synthetic generation), which is an important motivating workload for AI datacenter-grid interactions. We hope to improve and expand the data pipeline to support more workloads.
Overview¶
Data generation is integrated into the library classes that consume the data. Each class has generate(), save(), load(), and ensure() methods:
| Class | Generates | Consumed by |
|---|---|---|
InferenceData |
Power traces + ITL distribution fits | OfflineDatacenter |
LogisticModelStore |
Logistic curve fits (power, latency, throughput) | OFOBatchSizeController |
TrainingTrace |
Synthetic training power trace | OfflineDatacenter |
Each class provides an ensure() classmethod that generates data if it doesn't exist and loads it:
inference_data = InferenceData.ensure(data_dir, models, dt_s=0.1)
training_trace = TrainingTrace.ensure(Path("data/training_trace.csv"))
logistic_models = LogisticModelStore.ensure(data_dir, models)
Online Simulation
Online simulation with live GPUs does not use the power and ITL distributions, as they are supplied directly by running servers. However, the OFO controller can still use logistic fits for gradient estimation.
Spec-Driven Configuration¶
Benchmark selection now lives directly on InferenceModelSpec. A spec includes the benchmark query fields (model_id, gpu_model, task, gpus_per_replica, batch_sizes) together with serving-time knobs (itl_deadline_s, feasible_batch_sizes, fit_exclude_batch_sizes).
spec = InferenceModelSpec(
model_label="Llama-3.1-8B",
model_id="meta-llama/Llama-3.1-8B-Instruct",
gpu_model="H100",
task="lm-arena-chat",
gpus_per_replica=1,
tensor_parallel=1,
itl_deadline_s=0.08,
batch_sizes=(8, 16, 32, 64, 96, 128, 192, 256, 384, 512, 768),
feasible_batch_sizes=(8, 16, 32, 64, 128, 256, 512),
)
taskselects which benchmark prompt dataset to use (e.g.lm-arena-chat,gpqa).gpu_modelandgpus_per_replicaselect the measured hardware configuration.batch_sizescontrols which ML.ENERGY runs are requested.fit_exclude_batch_sizeslets you drop pathological measurements from logistic fitting without dropping them from trace extraction.- First run downloads benchmark data from the HuggingFace Hub and caches it per spec under
data/specs/<spec-hash>/.
Model specifications (InferenceModelSpec) are defined as Python constants inline in each example script. All other configuration (datacenter sizing, controller tuning, workload scenarios, grid setup) is also defined programmatically per-script. See Building Simulators for details.
Lazy Generation and Caching¶
Each data class provides an ensure() classmethod that combines generate-if-missing and load into a single call:
# First run: generates data to data_dir, then loads it.
# Subsequent runs: loads directly from cache.
inference_data = InferenceData.ensure(
data_dir, models,
dt_s=0.1,
)
training_trace = TrainingTrace.ensure(
Path("data/training_trace.csv"),
)
logistic_models = LogisticModelStore.ensure(
data_dir,
models,
)
Under the hood, ensure() checks whether the output file or directory exists. If not, it calls generate().save() to create the artifacts. Then it calls load() to return the ready-to-use object.
Default data path¶
Examples use a shared root such as data/specs/. Each InferenceModelSpec computes its own content-addressed cache key via cache_hash(), so changing one spec only regenerates that spec's artifacts.
Inference Data Generation¶
InferenceData.generate() uses the mlenergy-data toolkit to download and process GPU benchmark data from the ML.ENERGY Benchmark v3 dataset.
For each model and batch size, it:
- Extracts power timelines from benchmark runs
- Resamples to a median-duration grid
- Fits
ITLMixtureModeldistributions per batch size
ML.ENERGY Benchmark Dataset mlenergy-data
(Hugging Face hub) toolkit
┌─────────────────────┐
│ results.json │ LLMRuns.from_hf()
│ (power, latency, │────────────────────────────>┐
│ throughput, ITL) │ Load, filter, validate │
│ per model × batch │ │
└─────────────────────┘ │
v
┌───────────────────────────────────┐
Model spec │ InferenceData.generate() │
┌─────────────────────┐ │ │
│ InferenceModelSpec │ │ For each model x batch size: │
│ │────────>│ 1. Extract power timelines │
│ batch_sizes, task, │ │ 2. Resample to median-duration │
│ gpu_model, etc. │ │ 3. Fit ITLMixtureModel │
│ │ │ │
└─────────────────────┘ └───────────┬───────────────────────┘
│
v
┌────────────────────────────────┐
│ data/specs/<spec-hash>/ │
│ │
│ trace.csv │ <── GPU power timeseries
│ itl_fit.json │ <── ITL distribution params
│ logistic_fit.json │ <── OFO curve-fit params
│ _manifest.json │ <── Version stamp + spec
└────────────────────────────────┘
Logistic Curve Fitting¶
LogisticModelStore.generate() fits four-parameter logistic curves to power, latency, and throughput versus batch size:
where \(P_{\max}\) is the saturation magnitude, \(k_p\) controls transition sharpness, \(x_{0,p}\) is the characteristic batch size threshold, and \(p_0\) is an offset term. Latency and throughput use the same functional form with their own parameters.
OpenG2G uses LogisticModel from mlenergy-data at both stages:
- Generation:
LogisticModel.fit(x, y)fits the curve to benchmark data - Runtime:
LogisticModel.eval(batch)evaluates the curve, andLogisticModel.deriv_wrt_x(x)computes gradients for the OFO controller
ITL Mixture Model¶
Historical ITL measurements exhibit heavy-tailed behavior. The generation step captures this using a weighted mixture of two lognormal distributions per batch size.
OpenG2G uses ITLMixtureModel from mlenergy-data at both stages:
- Generation:
ITLMixtureModel.fit(samples)fits the mixture to raw ITL samples - Runtime:
ITLMixtureModel.sample_avg(n_replicas, rng)draws average latency across replicas
Training Trace Generation¶
TrainingTrace.generate() synthesizes a training power trace with configurable high/low plateaus, noise, brief dips, and a warm-up ramp. Generation is based on characteristics derived from real large model training measurements.
Parameters are controlled via TrainingTraceParams. The empty dict {} in the config uses all defaults.
Dataset Access¶
The mlenergy-data toolkit automatically downloads benchmark data from the ML.ENERGY Benchmark v3 dataset on first run.
To use the dataset:
- Request access on Hugging Face
- Create a Hugging Face access token
- Set the
HF_TOKENenvironment variable to your token before running.
Runtime Integration¶
At simulation time, the generated artifacts are consumed by two components:
OfflineDatacenter: UsesInferenceDatato replay periodic per-GPU power templates. Latency fits (ITLMixtureModel) are sampled at each control interval.OFOBatchSizeController: UsesLogisticModelStorefor logistic curve evaluation. Callseval()andderiv_wrt_x()at each control step to compute gradients.