# Developer Guide: `sdom.parametric` Implementation This page describes the internal design of the `sdom.parametric` sub-package for contributors who need to maintain, extend, or debug it. For end-user usage see [Parametric & Sensitivity Analysis](user_guide/parametric_analysis.md). --- ## Sub-package layout ``` src/sdom/parametric/ ├── __init__.py # Public surface — re-exports ParametricStudy + sweep types ├── sweeps.py # Sweep descriptor dataclasses (no logic, just validated containers) ├── mutations.py # Stateless data mutation helpers + TS_KEY_TO_COLUMN mapping ├── worker.py # Module-level picklable worker function (_run_single_case) └── study.py # ParametricStudy orchestrator + _make_safe_name utility ``` **One-way dependency rule:** `parametric` imports from the rest of `sdom` (`optimization_main`, `io_manager`, `results`), but no existing `sdom` module imports from `parametric`. Keep this direction to avoid circular imports. --- ## Data flow ``` ParametricStudy.run() │ ├── _build_case_dicts() │ ├── itertools.product(*dimensions) ← Cartesian product │ ├── _make_safe_name(label) ← filesystem-safe case name │ └── collision detection ← append _{index} if name collides │ ├── ProcessPoolExecutor.submit(_run_single_case, case_dict) │ │ (one future per case) │ └─ [worker process] ───────────────────────────────────────────────── │ ├── copy.deepcopy(case_dict["data"]) ← lazy copy, in-worker │ ├── _apply_scalar_mutation(...) │ ├── _apply_storage_factor_mutation(...) │ ├── _apply_ts_mutation(...) │ ├── initialize_model(data, n_hours, ...) │ └── run_solver(model, solver_config, case_name) │ └── returns OptimizationResults │ ├── as_completed(futures) │ ├── log progress [completed/total] │ ├── export_results(result, case_name, output_dir//) │ └── ordered_results[case_index] = result ← index-based ordering │ └── _write_summary_csv(case_dicts, ordered_results) └── parametric_summary.csv ``` --- ## Key design decisions ### 1. Lazy deep-copy (memory efficiency) **Problem:** Pre-allocating one `copy.deepcopy(base_data)` per case in the parent process (before dispatching to workers) causes the parent to hold `N × sizeof(base_data)` in memory simultaneously — a significant spike for large sweeps (50+ cases) or large datasets. **Solution:** Each `case_dict` carries a reference to `self._base_data` (the shared original). The deep copy is deferred to inside `_run_single_case`, which runs in a worker subprocess. Because `ProcessPoolExecutor` pickles arguments on dispatch, the `base_data` is serialised once per case on submission. The deep copy then runs inside the worker's own memory space, so the **parent process** always holds only one copy of the base data, regardless of sweep size. ```python # study.py — case_dict carries a reference, not a copy case_dicts.append({ "data": self._base_data, # shared reference ... }) # worker.py — copy is made inside the worker process data: dict = copy.deepcopy(case_dict["data"]) ``` ### 2. Collision-safe case naming `_make_safe_name` replaces filesystem-forbidden characters with `_`. This can cause two distinct parameter combinations to produce the same string (e.g. `1.0/2` and `1.0_2` both → `1.0_2`). **Solution:** After building all case dicts, `_build_case_dicts` counts how many times each safe name appears. Any name that appears more than once gets an index suffix appended: `_`. ```python name_counts: Dict[str, int] = {} for cd in case_dicts: name_counts[cd["case_name"]] = name_counts.get(cd["case_name"], 0) + 1 for cd in case_dicts: if name_counts[cd["case_name"]] > 1: cd["case_name"] = f"{cd['case_name']}_{cd['case_index']}" ``` This is deterministic: the same sweep configuration always produces the same set of case names. ### 3. Result ordering via `case_index` (not name lookup) `as_completed` returns futures in completion order (non-deterministic). To reconstruct results in Cartesian-product order, each case dict carries an integer `case_index` (its position in the product). Results are stored in a pre-allocated list: `ordered_results[cd["case_index"]] = result`. This is collision-safe: even if case names are disambiguated, the index is always unique. ### 4. Module-level worker function (pickling) `_run_single_case` must be defined at module level in `worker.py`, not as a nested function or lambda. `multiprocessing` uses `pickle` to send work to subprocesses, and only module-level callables can be pickled. ### 5. Graceful failure Both the worker and the orchestrator catch all exceptions: - `_run_single_case` catches exceptions from mutations, `initialize_model`, and `run_solver`. On failure it returns `OptimizationResults` with `termination_condition="exception"` and `total_cost=float("nan")`. - `study.run()` catches exceptions from `future.result()` (e.g. pickling failures or worker crashes). It constructs the same failure result object. `total_cost=NaN` (not `0.0`) ensures failed cases are distinguishable from valid zero-cost results in the summary CSV. --- ## `mutations.py` — TS_KEY_TO_COLUMN mapping The `TS_KEY_TO_COLUMN` dict maps every supported `data` dict key to the DataFrame column that holds the numeric time-series values. **These keys must match the actual keys set by `io_manager.load_data`**, not the CSV file names or model parameter names. | `ts_key` | DataFrame column | Set by `load_data` when | |---|---|---| | `"load_data"` | `"Load"` | Always | | `"large_hydro_data"` | `"LargeHydro"` | Large hydro formulation | | `"large_hydro_max"` | `"LargeHydro_max"` | Budget hydro formulation | | `"large_hydro_min"` | `"LargeHydro_min"` | Budget hydro formulation | | `"cap_imports"` | `"Imports"` | CapacityPriceNetLoadFormulation (imports) | | `"price_imports"` | `"Imports_price"` | CapacityPriceNetLoadFormulation (imports) | | `"cap_exports"` | `"Exports"` | CapacityPriceNetLoadFormulation (exports) | | `"price_exports"` | `"Exports_price"` | CapacityPriceNetLoadFormulation (exports) | | `"nuclear_data"` | `"Nuclear"` | If nuclear data is present | | `"other_renewables_data"` | `"OtherRenewables"` | If other renewables data is present | **If you add a new time-series to `io_manager.load_data`**, add a corresponding entry to `TS_KEY_TO_COLUMN` in `mutations.py`. --- ## Case naming format | Sweep type | Label fragment | Example | |---|---|---| | `ScalarSweep` | `{param_name}={value}` | `GenMix_Target=0.9` | | `StorageFactorSweep` | `{param_name}x{factor}` | `P_Capexx0.8` | | `TsSweep` | `{ts_key}x{factor}` | `load_datax1.05` | Fragments are joined with `_` and then passed through `_make_safe_name`. Forbidden filesystem characters (`/ \ : * ? " < > | space`) are replaced with `_`. Leading/trailing underscores are stripped. --- ## Summary CSV columns | Column | Source | |---|---| | `case_name` | `cd["case_name"]` | | `.` | from `scalar_mutations` list | | `storage_data._factor` | from `storage_factor_mutations` list | | `_factor` | from `ts_mutations` list | | `is_optimal` | `result.is_optimal` | | `total_cost` | `result.total_cost` (`NaN` for failures) | | `solver_status` | `result.solver_status` | | `termination_condition` | `result.termination_condition` | --- ## How to extend ### Adding a new sweep type 1. Add a dataclass in `sweeps.py` (follow the `ScalarSweep` pattern). 2. Add a mutation helper in `mutations.py`. 3. Add a `add__sweep` method in `ParametricStudy` that appends to a new `self.__sweeps` list. 4. Add a new `elif mut[0] == ""` branch in `_build_case_dicts`. 5. Apply the mutation in `_run_single_case` before `initialize_model`. 6. Add unit tests in `tests/test_parametric.py`. 7. Update `docs/source/user_guide/parametric_analysis.md` — add new sweep type to the sweep-types section and the ts_key table if applicable. ### Adding a new time-series key Only `TS_KEY_TO_COLUMN` in `mutations.py` needs updating — no other files require changes to support a new key. --- ## Testing strategy Tests live in `tests/test_parametric.py` and are split into: | Group | What is tested | |---|---| | Sweep dataclass validation | Empty `values`/`factors` raises `ValueError` | | Mutation helpers | Each helper: correct value, no side-effects on other rows/columns, clear errors | | `_make_safe_name` | Forbidden chars replaced, leading/trailing underscores stripped | | Core-count capping | `n_cores=9999` capped to `cpu_count - 1` | | Cartesian product | Correct count for 1/2/3-dimension sweeps | | Case name uniqueness | All generated names are unique | | Deep-copy isolation | Worker mutation does not affect `base_data` | | Summary CSV | Shape, required columns, written to correct path | | Integration (`@pytest.mark.integration`) | Full 4-case run on smallest dataset (`no_exchange_run_of_river`), `n_cores=1`, 72 hours; asserts per-case dirs + optimal results | Run only unit tests (fast, no solver needed): ```bash pytest -m "not integration" ``` Run everything including the integration test: ```bash pytest ``` --- ## Known limitations & deferred items | Item | Notes | |---|---| | Per-tech storage override | Currently only row-level factor is supported (scales all techs uniformly). Per-tech value override is possible but not yet implemented. | | Memory for very large datasets | Even with lazy copy, `ProcessPoolExecutor` pickles `base_data` once per submitted future. For very large data (>1 GB), consider using `initializer`/`initargs` to share data via a global in worker processes. | | Case name collisions (edge case) | Collision detection appends `_{index}`. For readability, consider using a hash suffix for very long names; deferred. |