Developer Guide: sdom.parametric Implementation#
This page describes the internal design of the sdom.parametric sub-package
for contributors who need to maintain, extend, or debug it.
For end-user usage see Parametric & Sensitivity Analysis.
Sub-package layout#
src/sdom/parametric/
├── __init__.py # Public surface — re-exports ParametricStudy + sweep types
├── sweeps.py # Sweep descriptor dataclasses (no logic, just validated containers)
├── mutations.py # Stateless data mutation helpers + TS_KEY_TO_COLUMN mapping
├── worker.py # Module-level picklable worker function (_run_single_case)
└── study.py # ParametricStudy orchestrator + _make_safe_name utility
One-way dependency rule: parametric imports from the rest of sdom
(optimization_main, io_manager, results), but no existing sdom module
imports from parametric. Keep this direction to avoid circular imports.
Data flow#
ParametricStudy.run()
│
├── _build_case_dicts()
│ ├── itertools.product(*dimensions) ← Cartesian product
│ ├── _make_safe_name(label) ← filesystem-safe case name
│ └── collision detection ← append _{index} if name collides
│
├── ProcessPoolExecutor.submit(_run_single_case, case_dict)
│ │ (one future per case)
│ └─ [worker process] ─────────────────────────────────────────────────
│ ├── copy.deepcopy(case_dict["data"]) ← lazy copy, in-worker
│ ├── _apply_scalar_mutation(...)
│ ├── _apply_storage_factor_mutation(...)
│ ├── _apply_ts_mutation(...)
│ ├── initialize_model(data, n_hours, ...)
│ └── run_solver(model, solver_config, case_name)
│ └── returns OptimizationResults
│
├── as_completed(futures)
│ ├── log progress [completed/total]
│ ├── export_results(result, case_name, output_dir/<case_name>/)
│ └── ordered_results[case_index] = result ← index-based ordering
│
└── _write_summary_csv(case_dicts, ordered_results)
└── parametric_summary.csv
Key design decisions#
1. Lazy deep-copy (memory efficiency)#
Problem: Pre-allocating one copy.deepcopy(base_data) per case in the
parent process (before dispatching to workers) causes the parent to hold
N × sizeof(base_data) in memory simultaneously — a significant spike for
large sweeps (50+ cases) or large datasets.
Solution: Each case_dict carries a reference to self._base_data
(the shared original). The deep copy is deferred to inside _run_single_case,
which runs in a worker subprocess. Because ProcessPoolExecutor pickles
arguments on dispatch, the base_data is serialised once per case on
submission. The deep copy then runs inside the worker’s own memory space,
so the parent process always holds only one copy of the base data,
regardless of sweep size.
# study.py — case_dict carries a reference, not a copy
case_dicts.append({
"data": self._base_data, # shared reference
...
})
# worker.py — copy is made inside the worker process
data: dict = copy.deepcopy(case_dict["data"])
2. Collision-safe case naming#
_make_safe_name replaces filesystem-forbidden characters with _. This
can cause two distinct parameter combinations to produce the same string
(e.g. 1.0/2 and 1.0_2 both → 1.0_2).
Solution: After building all case dicts, _build_case_dicts counts
how many times each safe name appears. Any name that appears more than once
gets an index suffix appended: <name>_<case_index>.
name_counts: Dict[str, int] = {}
for cd in case_dicts:
name_counts[cd["case_name"]] = name_counts.get(cd["case_name"], 0) + 1
for cd in case_dicts:
if name_counts[cd["case_name"]] > 1:
cd["case_name"] = f"{cd['case_name']}_{cd['case_index']}"
This is deterministic: the same sweep configuration always produces the same set of case names.
3. Result ordering via case_index (not name lookup)#
as_completed returns futures in completion order (non-deterministic). To
reconstruct results in Cartesian-product order, each case dict carries an
integer case_index (its position in the product). Results are stored in a
pre-allocated list: ordered_results[cd["case_index"]] = result.
This is collision-safe: even if case names are disambiguated, the index is always unique.
4. Module-level worker function (pickling)#
_run_single_case must be defined at module level in worker.py, not as a
nested function or lambda. multiprocessing uses pickle to send work to
subprocesses, and only module-level callables can be pickled.
5. Graceful failure#
Both the worker and the orchestrator catch all exceptions:
_run_single_casecatches exceptions from mutations,initialize_model, andrun_solver. On failure it returnsOptimizationResultswithtermination_condition="exception"andtotal_cost=float("nan").study.run()catches exceptions fromfuture.result()(e.g. pickling failures or worker crashes). It constructs the same failure result object.
total_cost=NaN (not 0.0) ensures failed cases are distinguishable from
valid zero-cost results in the summary CSV.
mutations.py — TS_KEY_TO_COLUMN mapping#
The TS_KEY_TO_COLUMN dict maps every supported data dict key to the
DataFrame column that holds the numeric time-series values. These keys must
match the actual keys set by io_manager.load_data, not the CSV file names
or model parameter names.
|
DataFrame column |
Set by |
|---|---|---|
|
|
Always |
|
|
Large hydro formulation |
|
|
Budget hydro formulation |
|
|
Budget hydro formulation |
|
|
CapacityPriceNetLoadFormulation (imports) |
|
|
CapacityPriceNetLoadFormulation (imports) |
|
|
CapacityPriceNetLoadFormulation (exports) |
|
|
CapacityPriceNetLoadFormulation (exports) |
|
|
If nuclear data is present |
|
|
If other renewables data is present |
If you add a new time-series to io_manager.load_data, add a
corresponding entry to TS_KEY_TO_COLUMN in mutations.py.
Case naming format#
Sweep type |
Label fragment |
Example |
|---|---|---|
|
|
|
|
|
|
|
|
|
Fragments are joined with _ and then passed through _make_safe_name.
Forbidden filesystem characters (/ \ : * ? " < > | space) are replaced
with _. Leading/trailing underscores are stripped.
Summary CSV columns#
Column |
Source |
|---|---|
|
|
|
from |
|
from |
|
from |
|
|
|
|
|
|
|
|
How to extend#
Adding a new sweep type#
Add a dataclass in
sweeps.py(follow theScalarSweeppattern).Add a mutation helper in
mutations.py.Add a
add_<type>_sweepmethod inParametricStudythat appends to a newself._<type>_sweepslist.Add a new
elif mut[0] == "<type>"branch in_build_case_dicts.Apply the mutation in
_run_single_casebeforeinitialize_model.Add unit tests in
tests/test_parametric.py.Update
docs/source/user_guide/parametric_analysis.md— add new sweep type to the sweep-types section and the ts_key table if applicable.
Adding a new time-series key#
Only TS_KEY_TO_COLUMN in mutations.py needs updating — no other files
require changes to support a new key.
Testing strategy#
Tests live in tests/test_parametric.py and are split into:
Group |
What is tested |
|---|---|
Sweep dataclass validation |
Empty |
Mutation helpers |
Each helper: correct value, no side-effects on other rows/columns, clear errors |
|
Forbidden chars replaced, leading/trailing underscores stripped |
Core-count capping |
|
Cartesian product |
Correct count for 1/2/3-dimension sweeps |
Case name uniqueness |
All generated names are unique |
Deep-copy isolation |
Worker mutation does not affect |
Summary CSV |
Shape, required columns, written to correct path |
Integration ( |
Full 4-case run on smallest dataset ( |
Run only unit tests (fast, no solver needed):
pytest -m "not integration"
Run everything including the integration test:
pytest
Known limitations & deferred items#
Item |
Notes |
|---|---|
Per-tech storage override |
Currently only row-level factor is supported (scales all techs uniformly). Per-tech value override is possible but not yet implemented. |
Memory for very large datasets |
Even with lazy copy, |
Case name collisions (edge case) |
Collision detection appends |