Data Pipeline¶
OpenG2G ships with built-in support for trace-replay simulations based on real GPU benchmark data. This page describes how raw benchmark measurements are compiled into artifacts that plug into simulation, and how those artifacts are consumed at runtime.
LLM workloads
The data pipeline is focused on LLM workloads (inference from the ML.ENERGY Benchmark results, and training via synthetic generation), which is an important motivating workload for AI datacenter-grid interactions. We hope to improve and expand the data pipeline to support more workloads.
Overview¶
Data generation is integrated into the library classes that consume the data. Each class has generate(), save(), load(), and ensure() methods:
| Class | Generates | Consumed by |
|---|---|---|
InferenceData |
Power traces + ITL distribution fits | OfflineDatacenter |
LogisticModelStore |
Logistic curve fits (power, latency, throughput) | OFOBatchSizeController |
TrainingTrace |
Synthetic training power trace | OfflineDatacenter |
Each class provides an ensure() classmethod that generates data if it doesn't exist and loads it:
inference_data = InferenceData.ensure(data_dir, models, data_sources, dt_s=0.1)
training_trace = TrainingTrace.ensure(data_dir / "training_trace.csv", training_params)
logistic_models = LogisticModelStore.ensure(data_dir / "logistic_fits.csv", models, data_sources)
Online Simulation
Online simulation with live GPUs does not use the power and ITL distributions, as they are supplied directly by running servers. However, the OFO controller can still use logistic fits for gradient estimation.
Config File¶
A shared config.json (examples/offline/config.json) stores model specifications and benchmark data sources:
{
"model_specs": [
{
"model_label": "Llama-3.1-8B",
"model_id": "meta-llama/Llama-3.1-8B-Instruct",
"gpus_per_replica": 1,
"itl_deadline_s": 0.08,
"feasible_batch_sizes": [8, 16, 32, 64, 128, 256, 512]
}
],
"data_sources": [
{
"model_label": "Llama-3.1-8B",
"task": "lm-arena-chat",
"gpu": "H100",
"batch_sizes": [8, 16, 32, 64, 96, 128, 192, 256, 384, 512, 768, 1024]
}
],
"training_trace_params": {}
}
model_specs[]entries are parsed asInferenceModelSpec. These describe model identity (GPU requirements, feasible batch sizes, latency deadlines) but not deployment-specific parameters like replica counts or initial batch sizes — those are defined per-experiment in each script.data_sources[]entries are parsed asMLEnergySource, linked to models bymodel_label.training_trace_paramsis parsed asTrainingTraceParams. Empty{}uses all defaults.- First run downloads benchmark data from the HuggingFace Hub and caches it in
data/offline/{hash}/.
All other configuration — datacenter sizing, controller tuning, workload scenarios, grid setup — is defined programmatically in each example script. See Building Simulators and examples/offline/systems.py for details. For running simulations, see Quickstart and the Examples documentation.
Lazy Generation and Caching¶
Each data class provides an ensure() classmethod that combines generate-if-missing and load into a single call:
# First run: generates data to data_dir, then loads it.
# Subsequent runs: loads directly from cache.
inference_data = InferenceData.ensure(
data_dir, models, data_sources,
dt_s=0.1,
)
training_trace = TrainingTrace.ensure(
data_dir / "training_trace.csv",
training_trace_params,
)
logistic_models = LogisticModelStore.ensure(
data_dir / "logistic_fits.csv",
models, data_sources,
)
Under the hood, ensure() checks whether the output file or directory exists. If not, it calls generate().save() to create the artifacts. Then it calls load() to return the ready-to-use object.
Default data path¶
The helper load_data_sources() in examples/offline/systems.py computes a hash-based cache path from the data-relevant config keys (data sources and training trace parameters). Different configs automatically get different cache directories, so you can switch configs without manually clearing the cache.
Inference Data Generation¶
InferenceData.generate() uses the mlenergy-data toolkit to download and process GPU benchmark data from the ML.ENERGY Benchmark v3 dataset.
For each model and batch size, it:
- Extracts power timelines from benchmark runs
- Resamples to a median-duration grid
- Fits
ITLMixtureModeldistributions per batch size
ML.ENERGY Benchmark Dataset mlenergy-data
(Hugging Face hub) toolkit
┌─────────────────────┐
│ results.json │ LLMRuns.from_hf()
│ (power, latency, │────────────────────────────>┐
│ throughput, ITL) │ Load, filter, validate │
│ per model × batch │ │
└─────────────────────┘ │
v
┌───────────────────────────────────┐
Config file │ InferenceData.generate() │
┌─────────────────────┐ │ │
│ config.json │ │ For each model x batch size: │
│ │────────>│ 1. Extract power timelines │
│ model_specs[] + │ │ 2. Resample to median-duration │
│ data_sources[] │ │ 3. Fit ITLMixtureModel │
└─────────────────────┘ └───────────┬───────────────────────┘
│
v
┌────────────────────────────────┐
│ data/offline/{hash}/ │
│ │
│ traces/*.csv │ <── GPU power timeseries
│ traces_summary.csv │ <── Trace manifest
│ latency_fits.csv │ <── ITL distribution params
│ _manifest.json │ <── Version stamp
└────────────────────────────────┘
Logistic Curve Fitting¶
LogisticModelStore.generate() fits four-parameter logistic curves to power, latency, and throughput versus batch size:
where \(P_{\max}\) is the saturation magnitude, \(k_p\) controls transition sharpness, \(x_{0,p}\) is the characteristic batch size threshold, and \(p_0\) is an offset term. Latency and throughput use the same functional form with their own parameters.
OpenG2G uses LogisticModel from mlenergy-data at both stages:
- Generation:
LogisticModel.fit(x, y)fits the curve to benchmark data - Runtime:
LogisticModel.eval(batch)evaluates the curve, andLogisticModel.deriv_wrt_x(x)computes gradients for the OFO controller
ITL Mixture Model¶
Historical ITL measurements exhibit heavy-tailed behavior. The generation step captures this using a weighted mixture of two lognormal distributions per batch size.
OpenG2G uses ITLMixtureModel from mlenergy-data at both stages:
- Generation:
ITLMixtureModel.fit(samples)fits the mixture to raw ITL samples - Runtime:
ITLMixtureModel.sample_avg(n_replicas, rng)draws average latency across replicas
Training Trace Generation¶
TrainingTrace.generate() synthesizes a training power trace with configurable high/low plateaus, noise, brief dips, and a warm-up ramp. Generation is based on characteristics derived from real large model training measurements.
Parameters are controlled via TrainingTraceParams. The empty dict {} in the config uses all defaults.
Dataset Access¶
The mlenergy-data toolkit automatically downloads benchmark data from the ML.ENERGY Benchmark v3 dataset on first run.
To use the dataset:
- Request access on Hugging Face
- Create a Hugging Face access token
- Set the
HF_TOKENenvironment variable to your token before running.
Runtime Integration¶
At simulation time, the generated artifacts are consumed by two components:
OfflineDatacenter: UsesInferenceDatato replay periodic per-GPU power templates. Latency fits (ITLMixtureModel) are sampled at each control interval.OFOBatchSizeController: UsesLogisticModelStorefor logistic curve evaluation. Callseval()andderiv_wrt_x()at each control step to compute gradients.