openg2g.models¶
openg2g.models.spec
¶
Model specification and workload dataclasses.
LLMInferenceModelSpec
dataclass
¶
Specification for one LLM model served in the datacenter.
Attributes:
| Name | Type | Description |
|---|---|---|
model_label |
str
|
Human-readable model identifier (e.g. |
num_replicas |
int
|
Total number of replicas of this model across the datacenter. |
gpus_per_replica |
int
|
GPUs allocated to each replica (determines model parallelism and per-replica power draw). |
initial_batch_size |
int
|
Initial batch size for this model. |
feasible_batch_sizes |
tuple[int, ...]
|
Allowed batch sizes for OFO control. Baseline mode only uses the first (or only) entry. |
itl_deadline_s |
float
|
Per-model inter-token latency deadline for the OFO latency dual (seconds). |
Source code in openg2g/models/spec.py
LLMInferenceWorkload
dataclass
¶
Aggregation of model specs into a workload description.
Attributes:
| Name | Type | Description |
|---|---|---|
models |
tuple[LLMInferenceModelSpec, ...]
|
Tuple of model specifications served in the datacenter. |
Source code in openg2g/models/spec.py
model_labels
property
¶
Ordered list of model labels.
total_gpus
property
¶
Total GPUs consumed by all inference models.
initial_batch_size_by_model
property
¶
Per-model initial batch sizes.
itl_deadline_by_model
property
¶
Per-model ITL deadlines (seconds).
required_measured_gpus
property
¶
Per-model measured GPU count (= gpus_per_replica).
feasible_batch_sizes_union
property
¶
Sorted union of all models' feasible batch sizes.