Developer Guide: sdom.parametric Implementation#

This page describes the internal design of the sdom.parametric sub-package for contributors who need to maintain, extend, or debug it.

For end-user usage see Parametric & Sensitivity Analysis.


Sub-package layout#

src/sdom/parametric/
├── __init__.py      # Public surface — re-exports ParametricStudy + sweep types
├── sweeps.py        # Sweep descriptor dataclasses (no logic, just validated containers)
├── mutations.py     # Stateless data mutation helpers + TS_KEY_TO_COLUMN mapping
├── worker.py        # Module-level picklable worker function (_run_single_case)
└── study.py         # ParametricStudy orchestrator + _make_safe_name utility

One-way dependency rule: parametric imports from the rest of sdom (optimization_main, io_manager, results), but no existing sdom module imports from parametric. Keep this direction to avoid circular imports.


Data flow#

ParametricStudy.run()
  │
  ├── _build_case_dicts()
  │     ├── itertools.product(*dimensions)   ← Cartesian product
  │     ├── _make_safe_name(label)           ← filesystem-safe case name
  │     └── collision detection             ← append _{index} if name collides
  │
  ├── ProcessPoolExecutor.submit(_run_single_case, case_dict)
  │     │                  (one future per case)
  │     └─ [worker process] ─────────────────────────────────────────────────
  │           ├── copy.deepcopy(case_dict["data"])    ← lazy copy, in-worker
  │           ├── _apply_scalar_mutation(...)
  │           ├── _apply_storage_factor_mutation(...)
  │           ├── _apply_ts_mutation(...)
  │           ├── initialize_model(data, n_hours, ...)
  │           └── run_solver(model, solver_config, case_name)
  │                    └── returns OptimizationResults
  │
  ├── as_completed(futures)
  │     ├── log progress [completed/total]
  │     ├── export_results(result, case_name, output_dir/<case_name>/)
  │     └── ordered_results[case_index] = result      ← index-based ordering
  │
  └── _write_summary_csv(case_dicts, ordered_results)
        └── parametric_summary.csv

Key design decisions#

1. Lazy deep-copy (memory efficiency)#

Problem: Pre-allocating one copy.deepcopy(base_data) per case in the parent process (before dispatching to workers) causes the parent to hold N × sizeof(base_data) in memory simultaneously — a significant spike for large sweeps (50+ cases) or large datasets.

Solution: Each case_dict carries a reference to self._base_data (the shared original). The deep copy is deferred to inside _run_single_case, which runs in a worker subprocess. Because ProcessPoolExecutor pickles arguments on dispatch, the base_data is serialised once per case on submission. The deep copy then runs inside the worker’s own memory space, so the parent process always holds only one copy of the base data, regardless of sweep size.

# study.py — case_dict carries a reference, not a copy
case_dicts.append({
    "data": self._base_data,   # shared reference
    ...
})

# worker.py — copy is made inside the worker process
data: dict = copy.deepcopy(case_dict["data"])

2. Collision-safe case naming#

_make_safe_name replaces filesystem-forbidden characters with _. This can cause two distinct parameter combinations to produce the same string (e.g. 1.0/2 and 1.0_2 both → 1.0_2).

Solution: After building all case dicts, _build_case_dicts counts how many times each safe name appears. Any name that appears more than once gets an index suffix appended: <name>_<case_index>.

name_counts: Dict[str, int] = {}
for cd in case_dicts:
    name_counts[cd["case_name"]] = name_counts.get(cd["case_name"], 0) + 1
for cd in case_dicts:
    if name_counts[cd["case_name"]] > 1:
        cd["case_name"] = f"{cd['case_name']}_{cd['case_index']}"

This is deterministic: the same sweep configuration always produces the same set of case names.

3. Result ordering via case_index (not name lookup)#

as_completed returns futures in completion order (non-deterministic). To reconstruct results in Cartesian-product order, each case dict carries an integer case_index (its position in the product). Results are stored in a pre-allocated list: ordered_results[cd["case_index"]] = result.

This is collision-safe: even if case names are disambiguated, the index is always unique.

4. Module-level worker function (pickling)#

_run_single_case must be defined at module level in worker.py, not as a nested function or lambda. multiprocessing uses pickle to send work to subprocesses, and only module-level callables can be pickled.

5. Graceful failure#

Both the worker and the orchestrator catch all exceptions:

  • _run_single_case catches exceptions from mutations, initialize_model, and run_solver. On failure it returns OptimizationResults with termination_condition="exception" and total_cost=float("nan").

  • study.run() catches exceptions from future.result() (e.g. pickling failures or worker crashes). It constructs the same failure result object.

total_cost=NaN (not 0.0) ensures failed cases are distinguishable from valid zero-cost results in the summary CSV.


mutations.py — TS_KEY_TO_COLUMN mapping#

The TS_KEY_TO_COLUMN dict maps every supported data dict key to the DataFrame column that holds the numeric time-series values. These keys must match the actual keys set by io_manager.load_data, not the CSV file names or model parameter names.

ts_key

DataFrame column

Set by load_data when

"load_data"

"Load"

Always

"large_hydro_data"

"LargeHydro"

Large hydro formulation

"large_hydro_max"

"LargeHydro_max"

Budget hydro formulation

"large_hydro_min"

"LargeHydro_min"

Budget hydro formulation

"cap_imports"

"Imports"

CapacityPriceNetLoadFormulation (imports)

"price_imports"

"Imports_price"

CapacityPriceNetLoadFormulation (imports)

"cap_exports"

"Exports"

CapacityPriceNetLoadFormulation (exports)

"price_exports"

"Exports_price"

CapacityPriceNetLoadFormulation (exports)

"nuclear_data"

"Nuclear"

If nuclear data is present

"other_renewables_data"

"OtherRenewables"

If other renewables data is present

If you add a new time-series to io_manager.load_data, add a corresponding entry to TS_KEY_TO_COLUMN in mutations.py.


Case naming format#

Sweep type

Label fragment

Example

ScalarSweep

{param_name}={value}

GenMix_Target=0.9

StorageFactorSweep

{param_name}x{factor}

P_Capexx0.8

TsSweep

{ts_key}x{factor}

load_datax1.05

Fragments are joined with _ and then passed through _make_safe_name. Forbidden filesystem characters (/ \ : * ? " < > | space) are replaced with _. Leading/trailing underscores are stripped.


Summary CSV columns#

Column

Source

case_name

cd["case_name"]

<data_key>.<param_name>

from scalar_mutations list

storage_data.<param>_factor

from storage_factor_mutations list

<ts_key>_factor

from ts_mutations list

is_optimal

result.is_optimal

total_cost

result.total_cost (NaN for failures)

solver_status

result.solver_status

termination_condition

result.termination_condition


How to extend#

Adding a new sweep type#

  1. Add a dataclass in sweeps.py (follow the ScalarSweep pattern).

  2. Add a mutation helper in mutations.py.

  3. Add a add_<type>_sweep method in ParametricStudy that appends to a new self._<type>_sweeps list.

  4. Add a new elif mut[0] == "<type>" branch in _build_case_dicts.

  5. Apply the mutation in _run_single_case before initialize_model.

  6. Add unit tests in tests/test_parametric.py.

  7. Update docs/source/user_guide/parametric_analysis.md — add new sweep type to the sweep-types section and the ts_key table if applicable.

Adding a new time-series key#

Only TS_KEY_TO_COLUMN in mutations.py needs updating — no other files require changes to support a new key.


Testing strategy#

Tests live in tests/test_parametric.py and are split into:

Group

What is tested

Sweep dataclass validation

Empty values/factors raises ValueError

Mutation helpers

Each helper: correct value, no side-effects on other rows/columns, clear errors

_make_safe_name

Forbidden chars replaced, leading/trailing underscores stripped

Core-count capping

n_cores=9999 capped to cpu_count - 1

Cartesian product

Correct count for 1/2/3-dimension sweeps

Case name uniqueness

All generated names are unique

Deep-copy isolation

Worker mutation does not affect base_data

Summary CSV

Shape, required columns, written to correct path

Integration (@pytest.mark.integration)

Full 4-case run on smallest dataset (no_exchange_run_of_river), n_cores=1, 72 hours; asserts per-case dirs + optimal results

Run only unit tests (fast, no solver needed):

pytest -m "not integration"

Run everything including the integration test:

pytest

Known limitations & deferred items#

Item

Notes

Per-tech storage override

Currently only row-level factor is supported (scales all techs uniformly). Per-tech value override is possible but not yet implemented.

Memory for very large datasets

Even with lazy copy, ProcessPoolExecutor pickles base_data once per submitted future. For very large data (>1 GB), consider using initializer/initargs to share data via a global in worker processes.

Case name collisions (edge case)

Collision detection appends _{index}. For readability, consider using a hash suffix for very long names; deferred.