Duration-Based Scoring for Temperature Excursions

Q: Can a duration-based score be used directly for batch release decisions?

Only if the scoring logic is validated under 11.10(a), the normalisation function is calibrated against the product's documented stability data, and any change to weighting or boundaries triggers re-validation under an ICH Q10 change-management process. The score should inform a documented disposition decision, not silently replace the qualified reviewer.

Binary threshold alarms — pass/fail on a single instantaneous reading — are increasingly indefensible for both regulatory compliance and product preservation in pharmaceutical cold chain operations. A momentary +8.1 °C blip on a 2–8 °C biologic and a six-hour drift at +11 °C are scored identically by a tripwire, yet they carry entirely different stability consequences. Duration-based scoring replaces the tripwire with a continuous risk quantity that integrates both the magnitude and the temporal persistence of a thermal deviation, turning raw telemetry into a proportional, auditable disposition signal. The single regulatory anchor for this work is 21 CFR Part 11.10(e): every score, the inputs that produced it, and the parameters that weighted it must survive as a secure, time-stamped, tamper-evident record.

Problem Statement: Why Tripwires Fail Compliance and Inventory

Three concrete problems push teams away from instantaneous thresholds:

Proportionality is impossible with a single bit. ICH Q9 quality risk management requires corrective actions to scale with quantified risk. A pass/fail flag carries no magnitude and no duration, so it cannot drive proportional CAPA — every breach looks the same and every breach triggers the heaviest response.
Sensor noise becomes product condemnation. A lone faulty thermocouple spike trips the wire and quarantines a shipment that was never thermally stressed, inflating handling risk and waste.
Static limits ignore degradation kinetics. Arrhenius-driven potency loss depends on cumulative thermal exposure, not on whether a line was crossed for one sample. WHO TRS 961 Annex 9 expects storage control to track manufacturer stability data, which is inherently exposure-based.

Duration-based scoring functions as the deterministic decision layer inside the broader Temperature Excursion Detection & Automated Rule Engines framework: it consumes contextualised telemetry and routes each event to a Monitor, Investigate, or Quarantine pathway without firing an unnecessary quarantine hold. Before a deviation ever reaches the scorer, tolerance bands are resolved per SKU by Dynamic Threshold Mapping for Multi-Product Pallets, and spurious single-probe spikes are filtered out by Multi-Sensor Correlation to Reduce False Positives so they never inflate the accumulated score.

Concept & Specification: The Magnitude–Duration Integral

The score is a time integral of a magnitude-weighting function f applied to the signed deviation from the validated envelope over the active excursion interval:

Score = \int_{t_{0}}^{t_{1}} f (Δ T (t)) d t

where $Δ T (t)$ is the instantaneous distance outside the nominal range (zero while the temperature stays in band). Choosing $f (Δ T) = Δ T^{2}$ penalises larger excursions disproportionately, while integrating over $d t$ accumulates persistence. The raw integral is then normalised to a 0–100 scale with configurable boundaries for the three operating states. A decay term with a configurable half-life ages stale exposure out of a sliding window so that a long-resolved deviation does not permanently pin the risk metric high.

Two properties make the integral audit-defensible rather than a black box. First, the exposure integral itself is computed in fixed-point arithmetic, so a replayed calculation is bit-identical on any host — directly supporting the “accurate and complete records” requirement of 21 CFR 11.10(b) and the reproducibility expectation of 11.10(a) system validation. Second, the scoring configuration (specification limits, weighting exponent, decay half-life, state boundaries) is fingerprinted and bound to every record, so a reviewer can prove which parameter set produced a given disposition — the change-control evidence required by 11.10(k).

The persisted scoring record carries the following fields. The Regulatory anchor column states why each field must exist for the record to be inspection-ready.

Field	Type	Constraint	Regulatory anchor
`excursion_id`	string (UUID)	Immutable, unique per event	11.10(e) traceable, attributable record
`peak_deviation_c`	decimal(5,4)	≥ 0; signed magnitude outside band	ICH Q9 risk magnitude evidence
`cumulative_minutes`	decimal(10,4)	≥ 0; in-excursion time only	Exposure basis for Arrhenius/MKT stability
`score`	decimal(7,4)	0–100, quantised half-up	11.10(b) accurate, reproducible value
`state`	enum	`MONITOR` / `INVESTIGATE` / `QUARANTINE`	ICH Q9 proportional CAPA routing
`config_version`	string (sha256)	Fingerprint of scoring parameters	11.10(k) change control over decision logic
`computed_at`	string (ISO 8601, UTC)	Timezone-aware, UTC	11.10(e) contemporaneous time-stamp
`prev_hash`	string (sha256)	Links to prior record	11.10(e) tamper-evident audit chain
`record_hash`	string (sha256)	SHA-256 of canonical record	ALCOA+ enduring, tamper-evident

Architecture Diagram: Detection Pipeline & State Transitions

Production scoring engines run as a stateful, event-driven pipeline. Sensor payloads are ingested and normalised to UTC; static bounds are contextualised per SKU and transit phase; the magnitude–duration integral is accumulated over the active window; and a state machine emits a structured disposition payload when a score crosses a boundary. Timestamp normalisation upstream depends on Time-Series Alignment for Multi-Zone Cold Storage, because an unaligned series produces meaningless dt intervals and a corrupt integral.

Production Python Implementation

The module below is a complete, runnable scoring engine. It computes the exposure integral in Decimal for reproducibility, rejects silent reporting gaps as data-integrity events rather than scoring them as zero exposure, fingerprints the configuration into every record, and links records into an append-only hash chain. Each block cites the clause it satisfies.

python

from __future__ import annotations

import hashlib
import json
import logging
from dataclasses import dataclass
from datetime import datetime, timezone
from decimal import Decimal, ROUND_HALF_UP
from enum import Enum

import pandas as pd

logger = logging.getLogger("excursion.scoring")


class ExcursionState(str, Enum):
    MONITOR = "MONITOR"
    INVESTIGATE = "INVESTIGATE"
    QUARANTINE = "QUARANTINE"


class DataGapError(ValueError):
    """Raised when a reporting gap exceeds the certified interval."""


@dataclass(frozen=True)
class ScoringConfig:
    # Validated storage envelope and weighting parameters. The whole object is
    # hash-fingerprinted into each record so the decision logic that produced a
    # score is reconstructable — change control under 21 CFR 11.10(k).
    spec_low_c: Decimal = Decimal("2.0")
    spec_high_c: Decimal = Decimal("8.0")
    half_life_minutes: float = 30.0      # ages stale exposure out of the window
    investigate_at: Decimal = Decimal("26")
    quarantine_at: Decimal = Decimal("76")
    max_gap_seconds: int = 300           # sensor's certified reporting interval
    norm_divisor: Decimal = Decimal("4.0")  # raw-integral -> 0..100 calibration
    decimals: int = 4

    def deviation(self, temp_c: Decimal) -> Decimal:
        # ΔT is the signed distance OUTSIDE the validated band; in-band == 0.
        if temp_c > self.spec_high_c:
            return temp_c - self.spec_high_c
        if temp_c < self.spec_low_c:
            return self.spec_low_c - temp_c
        return Decimal("0")

    def fingerprint(self) -> str:
        # Bind parameters to the record so a reviewer can prove which limits,
        # weighting and boundaries were active — 21 CFR 11.10(k) change control.
        payload = json.dumps(
            {k: str(v) for k, v in self.__dict__.items()},
            sort_keys=True, separators=(",", ":"),
        )
        return hashlib.sha256(payload.encode("utf-8")).hexdigest()


def _q(value: Decimal, places: int) -> Decimal:
    # Deterministic half-up rounding for audit-critical values — avoids IEEE-754
    # drift so a replay is bit-identical (21 CFR 11.10(a) validation / 11.10(b)).
    return value.quantize(Decimal(1).scaleb(-places), rounding=ROUND_HALF_UP)


def _classify(score: Decimal, cfg: ScoringConfig) -> ExcursionState:
    # Proportional routing — corrective action scales with quantified risk per ICH Q9.
    if score >= cfg.quarantine_at:
        return ExcursionState.QUARANTINE
    if score >= cfg.investigate_at:
        return ExcursionState.INVESTIGATE
    return ExcursionState.MONITOR


def score_excursion(
    readings: pd.DataFrame,
    cfg: ScoringConfig,
    excursion_id: str,
    prev_hash: str = "0" * 64,
) -> dict:
    """Score one excursion from a UTC-ordered DataFrame [ts, temp_c].

    Returns an append-only audit record. `ts` must be timezone-aware UTC and
    `temp_c` numeric (Celsius).
    """
    if readings.empty:
        raise ValueError(f"no readings supplied for {excursion_id}")

    df = readings.sort_values("ts").reset_index(drop=True)

    # A gap beyond the certified reporting interval is a DATA-INTEGRITY event,
    # not zero exposure. Flagging it keeps the record "complete" under ALCOA+
    # and EU GMP Annex 11 (computerised-system data integrity).
    gaps = df["ts"].diff().dt.total_seconds().fillna(0)
    if (gaps > cfg.max_gap_seconds).any():
        raise DataGapError(f"reporting gap > {cfg.max_gap_seconds}s in {excursion_id}")

    t_end = df["ts"].iloc[-1]
    score = Decimal("0")
    cumulative = Decimal("0")
    peak = Decimal("0")

    for i in range(1, len(df)):
        t0, t1 = df["ts"].iloc[i - 1], df["ts"].iloc[i]
        dt_min = Decimal(str((t1 - t0).total_seconds() / 60.0))
        d0 = cfg.deviation(Decimal(str(df["temp_c"].iloc[i - 1])))
        d1 = cfg.deviation(Decimal(str(df["temp_c"].iloc[i])))
        peak = max(peak, d0, d1)
        if d0 > 0 or d1 > 0:
            cumulative += dt_min
        # Trapezoidal area of f(ΔT)=ΔT² over the interval — quadratic weighting
        # penalises larger breaches disproportionately (ICH Q9 risk magnitude).
        area = (d0 * d0 + d1 * d1) / Decimal("2") * dt_min
        # Half-life decay by interval-midpoint age; documented smoothing only,
        # the audit value is the quantised final score below.
        mid_age = ((t_end - t0).total_seconds() + (t_end - t1).total_seconds()) / 120.0
        weight = Decimal(str(0.5 ** (mid_age / cfg.half_life_minutes)))
        score += area * weight

    score = _q(min(score / cfg.norm_divisor, Decimal("100")), cfg.decimals)
    state = _classify(score, cfg)

    record = {
        "excursion_id": excursion_id,
        "peak_deviation_c": str(_q(peak, cfg.decimals)),
        "cumulative_minutes": str(_q(cumulative, cfg.decimals)),
        "score": str(score),
        "state": state.value,
        "config_version": cfg.fingerprint(),
        # Contemporaneous, timezone-aware UTC stamp — 21 CFR 11.10(e).
        "computed_at": datetime.now(timezone.utc).isoformat(),
        "prev_hash": prev_hash,
    }
    # Append-only hash chain: each record commits to its predecessor, so any
    # retrospective edit breaks the chain — tamper-evidence for the secure,
    # time-stamped audit trail required by 21 CFR 11.10(e).
    canonical = json.dumps(record, sort_keys=True, separators=(",", ":"))
    record["record_hash"] = hashlib.sha256(canonical.encode("utf-8")).hexdigest()

    logger.info("scored %s -> %s (%s)", excursion_id, record["score"], state.value)
    return record

For the memory-optimised deque variant, the vectorised rolling-window form, and the time-weighted MKT calculation per USP <1079>, see the step-by-step guide on implementing sliding window algorithms for excursion detection.

Configuration & Deployment Parameters

Scoring behaviour is driven entirely by configuration, which keeps the decision logic version-controlled and re-validatable. Treat the parameter set as a controlled document under your ICH Q10 pharmaceutical quality system: any change to limits, weighting, decay, or boundaries is a change-management event that may require re-validation and, where the engine informs batch release, regulatory notification.

Variable	Example	Purpose	Regulatory anchor
`SCORE_SPEC_LOW_C` / `SCORE_SPEC_HIGH_C`	`2.0` / `8.0`	Validated storage envelope per SKU	WHO TRS 961 Annex 9 storage limits
`SCORE_HALF_LIFE_MIN`	`30`	Sliding-window decay half-life	ICH Q9 risk-decay justification
`SCORE_INVESTIGATE_AT` / `SCORE_QUARANTINE_AT`	`26` / `76`	State boundaries	ICH Q9 proportional CAPA
`SCORE_MAX_GAP_SECONDS`	`300`	Certified reporting interval	EU GMP Annex 11 data completeness
`SCORE_NORM_DIVISOR`	`4.0`	Raw-integral → 0–100 calibration	11.10(a) validated, documented logic
`AUDIT_DB_URL`	append-only store DSN	Hash-chained record sink	11.10(e) secure audit trail

Decouple ingestion from scoring with a message broker (RabbitMQ or Kafka) so high-frequency sensor bursts cannot create backpressure on the scorer. Rotate the broker’s mTLS client certificates on a fixed schedule and fail closed if a certificate is expired — an unauthenticated telemetry source must never be able to write into the regulated record stream. Calibrate SCORE_NORM_DIVISOR against your product’s actual Arrhenius kinetics rather than a generic constant; a linear normalisation lets a low-magnitude, long-duration drift reach the same score as a high-magnitude, short one, which may not reflect real stability risk.

Verification & Testing

A scoring engine that informs disposition is GxP-relevant software, so its tests are part of the validation evidence package, not a developer convenience. Build the suite around the edge cases an inspector will probe:

Reproducibility (replay) tests. Feed the same historical telemetry twice and assert bit-identical score and record_hash. Idempotent replay is the practical demonstration of 11.10(a) validation and 11.10(b) accurate copies.
Boundary tests. Assert exact MONITOR → INVESTIGATE (26) and INVESTIGATE → QUARANTINE (76) transitions, including the equality case, so routing under ICH Q9 is deterministic.
Data-gap tests. Inject a gap beyond SCORE_MAX_GAP_SECONDS and assert DataGapError rather than a silently low score — proving the record stays complete under EU GMP Annex 11.
Audit-chain tests. Mutate one stored field and assert the recomputed record_hash no longer matches prev_hash of the successor, demonstrating tamper-evidence for 11.10(e).
CSV protocol hooks. Expose a fixture loader that reads an OQ test-vector CSV (timestamp, temperature, expected score, expected state) so Operational Qualification can be executed and signed against documented expected outputs.

python

import pandas as pd
from decimal import Decimal


def _series(rows):
    df = pd.DataFrame(rows, columns=["ts", "temp_c"])
    df["ts"] = pd.to_datetime(df["ts"], utc=True)  # ISO 8601 UTC alignment
    return df


def test_replay_is_bit_identical():
    cfg = ScoringConfig()
    data = _series([
        ("2026-06-28T08:00:00Z", 8.0),
        ("2026-06-28T08:30:00Z", 11.0),
        ("2026-06-28T09:00:00Z", 11.0),
    ])
    a = score_excursion(data, cfg, "EXC-1")
    b = score_excursion(data, cfg, "EXC-1")
    # Idempotent output is the evidence for 21 CFR 11.10(a) validation.
    assert a["score"] == b["score"]
    assert a["record_hash"] == b["record_hash"]


def test_gap_is_flagged_not_scored():
    cfg = ScoringConfig(max_gap_seconds=300)
    data = _series([
        ("2026-06-28T08:00:00Z", 11.0),
        ("2026-06-28T09:00:00Z", 11.0),  # 1h silent gap > 300s
    ])
    try:
        score_excursion(data, cfg, "EXC-2")
        assert False, "expected DataGapError for completeness (Annex 11)"
    except DataGapError:
        pass

Known Failure Modes & Mitigations

Failure mode	Symptom	Mitigation	Regulatory anchor
Clock skew between loggers	Negative or zero `dt`, distorted integral	Reject non-monotonic series; align to UTC upstream	EU GMP Annex 11 accurate time
Silent reporting gap	Artificially low score, missed exposure	Raise `DataGapError`; record as integrity event	ALCOA+ complete records
Single-probe noise spike	False-positive accumulation	Gate scoring on multi-sensor agreement	ICH Q9 risk-based filtering
Broker disconnect	Backpressure, dropped telemetry	Durable queue + replay on reconnect	11.10(e) record protection
Config drift across hosts	Divergent scores for same data	Fingerprint config into every record	11.10(k) change control
Float rounding divergence	Non-reproducible audit value	Fixed-point `Decimal` + half-up quantise	11.10(b) accurate records

When the multi-sensor agreement gate cannot be satisfied because auxiliary probes have failed, fall back to single-sensor scoring with tightened limits and elevated alert priority rather than suspending monitoring; partial coverage with documented degradation is defensible, a monitoring gap is not.

Compliance FAQ

Does an append-only hash chain satisfy the 21 CFR 11.10(e) audit-trail requirement?

It satisfies the tamper-evidence and time-stamp expectations of 11.10(e) provided the chain is anchored to a write-once or append-only store and the verification routine is itself validated. The hash chain proves records were not edited after the fact, but 11.10(e) also requires the original entry to remain retrievable, so retain the raw telemetry alongside the scored records.

Can a duration-based score be used directly for batch release decisions?

Only if the scoring logic is validated (11.10(a)), the normalisation function is calibrated against the product’s documented stability data, and any change to weighting or boundaries triggers re-validation under your ICH Q10 change-management process. The score should inform a documented disposition decision, not silently replace the qualified reviewer.

Why integrate exposure instead of alarming on the threshold crossing?

ICH Q9 requires corrective action proportional to quantified risk, and Arrhenius degradation depends on cumulative thermal exposure. A threshold crossing carries neither magnitude nor duration, so it cannot support proportional CAPA; the integral does.

Implementing sliding window algorithms for excursion detection — memory-optimised deque and vectorised scoring with time-weighted MKT.
Dynamic Threshold Mapping for Multi-Product Pallets — resolves the per-SKU bands the scorer measures against.
Multi-Sensor Correlation to Reduce False Positives — filters spurious spikes before they accumulate.
Cache-Warming Strategies for Real-Time Rule Engines — keeps threshold lookups hot for sub-second scoring latency.
Establishing Temperature Excursion Thresholds by Product — the stability-data source for envelope and normalisation calibration.

For architectural context, this page sits under Temperature Excursion Detection & Automated Rule Engines.