ferm_runs. Ticks up as new batches stream in.contamination = 'Y'. Recent-window read.contamination = 'Y'. Fallback when last-100 is unavailable.yield_category = 'High'.avg_yield is max_fpu_ml — Filter Paper Units per mL.run_start_ts. Drives the live ticker + risk banner.baseline.rate. The next-batch probability is a rolling average — deliberately simple; the work happens in Lens 2.{rate, ci_low, ci_high, n, label, key, fallback}. Drawn from the latest batch's stratum (shift × yeast_band × humidity_band × hardness_band) when n ≥ 200; otherwise fallback=true and the population baseline is used instead.shift = NIGHT).rate − baseline.rate, in percentage points. Positive = predicate raises the outcome rate.p_adjusted < 0.05. Drives the row-fade in Lens 2.p < 0.05 boolean. A "confounded" amber flag appears when the marginal lift is significant but the adjusted OR is not — the effect was a proxy for a correlated predictor.NIGHT ∧ yeast_lot ∈ {B44, B47} ∧ humidity ≥ 75%. Interaction term not included in the logistic model (would need a dedicated cross-term).strong (≥15pp lift, significant), moderate (≥5pp), weak (significant but small), weak_unconfirmed (visible trend but n.s. under FDR), negligible. No prior beliefs are tested or judged.ln(OD) vs t. The fundamental kinetic parameter.run_id, holds the live decision card + drift score + alerts + narrative + last_updated_at. Read by the dashboard in O(1); written by Python whenever the analysis is recomputed.1 − exp(−x/1.5). Rises smoothly before any binary detector fires.rising / steady / falling — compares last 24 h drift to prior 24 h. Flags accelerating deterioration before it crosses an alert threshold.from_cache=true when serving a row younger than the TTL (default 300 s); otherwise recomputes and persists.viewer / operator / supervisor. Stored in the qm_role cookie; surfaced via /api/me. The role pill in the dashboard header shows the current tier and (in demo mode) cycles through all three on click.code: "role_insufficient" when the cookie's role is below the required tier. Currently gates /api/label_outcome and /api/refresh_run_state.true when the hypothesis has an automated probe; UI shows a "▶ Run automated check" button when so.{result, evidence, details}. Result is one of supports / refutes / weakly_refutes / inconclusive / not_automated.regime + drift × 50 + alerts × 10 + severity bonus + vessel_load × 20. Higher = more attention needed.n most recent LIVE_ rows; reads each one's cached state from FERM_RUN_STATE (recomputes only when cache is missing or older than max_age seconds). Returns the runs ranked by priority.window runs (default 50). The "recent stress" component.rising / steady / falling within the window — compares first-half rate to second-half. Detects accelerating wear.low / medium / high tier with the operator-facing maintenance recommendation. "Schedule preventive maintenance" when load_score ≥ 0.65.confirmed_cause matches a hypothesis name from the differential, building the labeled dataset that future scoring can use as posterior priors.definite (lab confirmed) / likely / uncertain. The "how sure are you" filter applied when training on this label later.most_likely / alternative / lower_probability) — visual ranking only, not a calibrated probability.info / medium / high / critical. Maps to recommended action: continue / verify / hold / abort.duration_h − trigger_t_h. The killer metric — how many hours of fermentation you would have saved if you'd caught this issue at trigger time.nominal (continue) / watch (verify) / intervene (hold or abort). Set by alert severity + driver count.continue / verify / hold / abort. The single operator-facing call.ferm_full_v (the join of FERM_RUNS with FERM_ENV).t_start_h / t_end_h from the same phase model Lens 6 uses. status is one of pending, active, done, alert.FERM_RUN_STATE instead of being derived.contam_onset_h.{t_h, value} points. Sample interval defaults to 60 minutes; clamped to [10, 240].time_to_peak_h, stationary fills the remainder. Marked as dashed rules on the chart.(0.5·exp_end, exp_end + 0.6·(duration − exp_end)) so it can fall inside late-exponential or anywhere in stationary.interpret() method (e.g. "OUR 38 mmol/L/h — within healthy aerobic range").binary for one-hot categorical, continuous for z-score-standardized.stability × |effect|; rows with stability < 50% render dimmed.raises (effect > 0) or lowers (effect < 0) the contamination odds.Plain groupby arithmetic for the point estimates, proper proportion statistics for the uncertainty. Every number is reproducible directly from SQL plus a handful of named Python functions — no black-box ML.
baseline.rate = ( count(rows where outcome_matches) / count(rows) ) × 100
# outcome_matches = contamination='Y' OR yield_category='High'
S_p = { rows where predicate p(row) is true }
rate_p = ( count(rows in S_p where outcome_matches) / |S_p| ) × 100
lift_pp_p = rate_p − baseline.rate
# Factors ranked by |lift_pp| descending; top 6 returned.
z = 1.96 # 95% two-sided phat = succ / n denom = 1 + z² / n center = ( phat + z² / (2n) ) / denom halfwdt = z × √( phat(1−phat)/n + z²/(4n²) ) / denom CI = [ (center − halfwdt) × 100 , (center + halfwdt) × 100 ] # Wilson (not the normal-approximation CI) — it's well-behaved at # rates near 0% or 100% and on small n, where the normal CI can # produce impossible values like a "−3%" lower bound.
# 2x2 contingency for each factor: outcome=Y outcome=N predicate=T a b predicate=F c d # Yates-corrected χ² statistic, df=1: χ² = Σ (|obs − exp| − 0.5)² / exp # over the 4 cells p = 2 × P(|Z| > √χ²) # via math.erf, stdlib only
We test ~10 factors and 7 findings simultaneously. Raw chi-square p-values would produce several false positives by chance. BH step-up controls the expected false discovery rate at α = 0.05 — the p_adjusted shown to the magnitude tag and the factor-significance pill is FDR-corrected, not raw.
The marginal lifts above are confounded: a +16pp lift for shift = NIGHT may leak in effects from humidity, lot, and crew that happen to correlate with night shift. To separate them we fit a single logistic regression on the full row set:
logit( P(outcome=1) ) = β₀ + Σ βⱼ · xⱼ # xⱼ ∈ {shift_NIGHT, shift_DAY, yeast_bad_lot, humidity_high, # humidity_low, hardness_high, temp_high, ph_low} # neighbor_contaminated and crew=night-foxtrot are excluded — they're # outcome proxies in the data generator (set only when contam=Y) and # would cause quasi-separation in the fit. They still show as Lens 2 # marginal rates, just without an adjusted-OR sub-line. adj_OR(xⱼ) = exp( βⱼ ) # effect holding others constant adj_OR_CI(xⱼ) = exp( βⱼ ± 1.96 · SE(βⱼ) ) # 95% Wald CI adj_p(xⱼ) = 2 · P( |Z| > |βⱼ / SE(βⱼ)| ) # Wald test
Fitting uses Newton-Raphson IRLS (converges in ~10 iterations on well-behaved data). Standard errors come from the diagonal of the inverse observed Fisher information — the textbook frequentist asymptotic. Zero-variance predictors are dropped to avoid singularity.
Some fields are assigned as a consequence of contamination rather than as a cause of it. In this dataset, crew = night-foxtrot and neighbor_contaminated = 'Y' are set only when contamination = 'Y' is already decided — that's the live-injector's bookkeeping model for "a contaminated batch flags itself." These fields carry 100% correlation with the outcome by construction, which makes them outcome proxies, not predictors.
Including outcome proxies in a regression causes quasi-separation: the coefficient wants to be ±∞, the Hessian becomes singular, IRLS diverges, and the whole fit returns no results. So they're excluded from the predictor set. They still appear in Lens 2 as descriptive marginal rates (a crew with 100% contamination is meaningful information) — just without an adjusted-OR sub-line.
# Rule: a field belongs in the regression only if it's measurable # BEFORE the outcome is known. Otherwise it's a post-hoc label, # not a predictor. included: shift, yeast_lot, humidity, water hardness, temp, pH excluded: neighbor_contaminated, crew = night-foxtrot (both set as a function of contamination)
Real operational data frequently has predictors that are nearly collinear with the outcome on a subset (even after outcome proxies are excluded — e.g., shift=NIGHT + yeast=B44 + humidity≥75% might have a 98% contamination rate, which pushes the Hessian close to singular). To keep the fit stable, we add a small L2 penalty on the non-intercept coefficients:
β̂ = argmax log L(β) − (λ/2) · β₋₀ᵀ · β₋₀ # ridge log-likelihood # IRLS update with ridge: β ← β + ( XᵀWX + λQ )⁻¹ · ( Xᵀ(y−p) − λQβ ) # Q = diag(0, 1, 1, ..., 1) — intercept not penalized # λ = 1e-3 · n — scales with sample size # eta clipped to [-30, 30] — keeps sigmoid numerically stable
The penalty is very light — on well-behaved data it leaves MLE point estimates essentially unchanged (peak_od adj OR shifted from 0.57 to 0.59 in the ground-truth synthetic test). Its only job is to prevent the coefficient from running to infinity when the data is pathological; then you get a finite estimate with an honestly wide confidence interval, rather than a silently missing row. The intercept is not penalized, so the baseline rate stays unbiased.
adj OR ≈ 1.0 · n.s., the effect is confounded — it was a proxy for a correlated predictor. That's the row flagged in amber. The big real story is the factors whose adjusted OR stays large after controlling for everything else.Despite the ridge, a fit can still fail if n is too small (≤ p + 1), numpy isn't available, or the data is truly pathological. In that case _build_adjusted_ors returns empty ORs but keeps the population mean / SD / n metadata for the continuous predictors. Lens 4 then renders as a descriptive panel — three rows with mean/SD/n and a fit did not converge badge — instead of silently hiding. An operator always sees that biomass is being tracked; the adjusted effect is just labeled as unavailable.
Three continuous measurements from the tank enter the same logistic regression alongside the categorical factors above. Each is standardized to a z-score before fitting so the coefficient reads as "OR per +1 SD change":
# Continuous predictors, standardized in the design matrix: peak_od # max OD600 reached during the run time_to_peak_h # hours from inoculation to peak mu_h1 # specific growth rate (1/h) in exponential phase # For each continuous predictor xc: zc = (xc − mean(xc)) / sd(xc) adj_OR_per_SD = exp( β_c ) # odds multiplier for each +1 SD adj_OR_CI = exp( β_c ± 1.96·SE )
Interpretation in the UI translates the OR back into plain-language odds change. For example, if adj OR 0.72 on peak OD with SD ≈ 7, the line reads "each +1 SD (≈7 OD) → ~28% lower contamination odds". Because these are fit in the same model as shift/lot/humidity, the biomass effect is already adjusted for those covariates.
peak_od, time_to_peak_h, or mu_h1 are dropped from the biomass fit. The n shown in Lens 4 is the usable count after that filter.Lens 2 and Lens 4 cover the dominant predictors — the handful of factors a domain expert would have flagged anyway. Lens 5 takes the wider feature surface (currently-tracked + agent-simulated, ~28 columns) and runs elastic-net penalized logistic regression with bootstrap stability selection: which predictors keep getting picked across many resamples?
# Penalized log-likelihood (elastic net = L1 + L2): β̂ = argmin −(1/n)·logL(β) + λ·α·||β₋₀||₁ # L1 — drives weak features to exactly zero + λ·(1−α)/2·||β₋₀||² # L2 — keeps surviving features stable under collinearity # Solved with FISTA (Beck & Teboulle 2009): proximal-gradient with momentum, # pure-numpy. ~50 ms per fit at n=5000, p=30. # Stability selection (Meinshausen & Bühlmann 2010): for b = 1..40: sub_b = subsample 70% of rows with replacement β_b = fit_elastic_net(X[sub_b], y[sub_b], α=0.5, λ=0.02) selected_b = { j : |β_b[j]| > 0 } stability(j) = (1/40) · Σ_b 1[j ∈ selected_b] score(j) = stability(j) · |mean β_j across resamples where selected|
Each predictor in the panel shows: stability (selection frequency, as a bar), standardized log-odds effect, direction (↑ raises contamination odds / ↓ lowers them), and whether it comes from a real DB column or a sensor agent. Rows with stability < 50% render dimmed — they're real signals but the regression isn't confident enough to act on them yet.
Showing one number to all batches misleads operators the way comparing your blood pressure to a global mean misleads patients. The right baseline for a specific batch is the rate observed in batches with the same dominant configuration. We compute that on the fly per request:
# Strata: cross-product of dominant categorical predicates key(row) = shift # DAY / SWING / NIGHT ⊕ (yeast_lot in {B44,B47} ? 'B44/B47' : 'OTHER') ⊕ humidity_band # HUMID ≥75 / DRY ≤55 / NORMAL ⊕ (water_hardness ≥ 180 ? 'HARD' : 'NORMAL') # 36 cells; 57K rows ⇒ ~1500 rows/cell on average strata[key] = { n, succ, rate, ci_low, ci_high } # Wilson CI as everywhere else # Guardrail: cells with n < 200 fall back to population baseline. # Reported as fallback=true so the UI can be honest about the punt. contextual = strata[key(latest_batch)] if n ≥ 200 else POPULATION delta_pp = contextual.rate − population.rate
For a specific NIGHT + B44/B47 + HUMID + HARD batch the population baseline of 47% may understate the real expected rate (closer to 80% in practice). The contextual block in Lens 1 shows both numbers side by side, plus the delta as an explicit "+34pp vs population" tag. Operators read the right number for the configuration they're actually running.
data/09_baseline_strata.sql creates a materialized view FERM_BASELINE_STRATA refreshed nightly, with the same schema. The Python read path can swap from in-memory groupby to ORDS lookup with a one-line change. Baselines drift slowly — adding 100 LIVE rows to a 1500-row stratum shifts its rate by < 1pp — so daily refresh is honest.Lens 1–8 answer "how is THIS run?". The attention router answers "across all runs in flight, which one needs my eye next?" — the operator-cockpit question for facilities running multiple vessels in parallel. It's a pure read against the hot tier; no analysis is recomputed unless a run's state cache is empty or stale.
priority(run) =
100 if regime == "intervene"
+ 30 if regime == "watch"
+ drift_score × 50
+ n_active_alerts × 10
+ (50 if earliest_alert_severity == "critical" else
20 if earliest_alert_severity == "high" else 0)
+ vessel_load × 20
Click any row in the attention router card and the dashboard's per-run panels (Lens 6 trace, Lens 7 alerts, Lens 8 decision + drift) reload focused on that run. Pure UI affordance — same data, different lens.
The differential's confirm_by instructions tell the operator what to look at. Some of those checks are automatable — they're queries against data already in the system. The "▶ Run automated check" button on each differential candidate fires the matching probe and returns a verdict inline:
CONFIRM_CHECKS registry — 4 of 7 hypotheses currently automated: NIGHT-shift event → does OUR drop align with an 8h boundary (±2h)? Yeast lot lag failure → did pHi cross 6.7 BEFORE viability dropped < 80%? Sterilization breach → was SIP cycle < 45 min? Aeration-transfer limit → did min DO drop < 20%? Each check returns a verdict in {supports, refutes, weakly_refutes, inconclusive} plus a one-sentence evidence string and structured details for audit.
Phase-summary statistics (means, mins, maxes) compress 168 h of trace into a handful of static numbers but lose dynamic information. Slopes and curvatures capture how the trajectory is changing — the difference between "OUR is at 12 mmol/L/h, holding" and "OUR is at 12, dropping at 1.5/h with positive curvature (acceleration)". The watchlist's elastic-net now sees these alongside the existing predictors:
# Late-window slopes (last 12-24h): our_slope_late_per_h # mmol/L/h² — first derivative of OUR pHi_slope_late_per_h # pH units / h viab_slope_late_per_h # % / h # Late-window curvature (last 24h, fits y = a·t² + b·t + c): our_curvature_late # 2a — accelerating decline detector # Cross-trace and shape derivatives: cer_to_our_late # late-window RQ (CER/OUR) vcd_plateau_frac # t at which VCD reaches 95% of peak / total duration
Each is computed once per row from the trace already generated for the watchlist's per-row simulation step, so the marginal cost is small. Stability selection then surfaces whichever ones turn out to be stable predictors across resamples.
Body analogue: chronic stress accumulates measurably (cortisol exposure, AGE buildup) even when every individual test is normal. A vessel is the same — every SIP cycle, every contamination event, every high-antifoam batch deposits a little wear. Eventually a healthy-looking vessel is operating in a quietly compromised state.
# For one vessel, summarized over the last `window` runs: runs_total = total runs through this vessel (lifetime) contam_pct_recent = contam rate in last `window` runs (default 50) trend = compare first-half-of-window vs second-half: rising if recent_pct > prior_pct + 5 falling if recent_pct < prior_pct − 5 steady otherwise # Composite 0..1 score, weighted blend: age_term = min(1.0, runs_total / 500) contam_term = min(1.0, contam_pct_recent / 80) trend_term = { rising: 1.0, steady: 0.5, falling: 0.2 } load_score = 0.40 · age_term + 0.40 · contam_term + 0.20 · trend_term # Maintenance recommendation tier: load ≥ 0.65 → "Schedule preventive maintenance" (severity high) load ≥ 0.40 → "Monitor closely; inspect within 50 runs" (medium) otherwise → "Within nominal envelope" (low)
Differential diagnosis stars are heuristic ranks today. To turn them into calibrated probabilities, the system needs ground truth: when an event resolved, what was the actual root cause? The label form on each differential candidate captures that:
POST /api/label_outcome → ferm_outcome_label (PK: run_id, MERGE upsert)
{ run_id, confirmed_cause, confidence, operator_notes, labeled_by }
┌──────────────────────────────────────────────┐
│ Future closing-the-loop pipeline: │
│ ferm_outcome_label ← ground truth labels │
│ ↓ │
│ Bayesian update of hypothesis priors: │
│ P(cause | evidence) = │
│ P(evidence | cause) · P(cause) / Σ │
│ ↓ │
│ Stars → calibrated probabilities │
└──────────────────────────────────────────────┘
Today the labels accumulate without back-feeding the scorer. At ~50 confirmed cases per common cause, the next PR can swap heuristic ★ ratings for posterior probabilities derived from this dataset. The schema is ready; the data collection starts now.
The "↔ Compare with another batch" button on the decision card opens a modal that fetches /api/run_state for both run_ids in parallel and renders side-by-side. Differences (regime, drift, action, contextual baseline, top differential, alert count) are highlighted with a violet outline. No new endpoint — just a new UI surface over the hot-tier read path. Useful for "this batch went wrong, what was different about the last clean one?" triage workflow.
The earlier elastic-net used a fixed λ = 0.02. That worked but wasn't auditable: nothing in the response said why 0.02 vs 0.05. Tuning fixes this — a coarse log-spaced grid {0.005, 0.01, 0.02, 0.05, 0.10, 0.20}, fit each, count nonzero coefficients, pick the λ that yields ≈ 10 selected predictors. The chosen value flows back through the response in method.lambda. This is the Meinshausen-Bühlmann "select λ for desired sparsity" recipe — more transparent than holdout CV because operators can read it as "I want this many predictors surfaced and the algorithm finds the regularization strength that delivers that."
Doctors don't say "you have a disease" — they propose a ranked list of candidates and tell you what test would distinguish them. Same idea here. When Lens 8 says "watch" or "intervene", the rationale used to be one sentence; now it's accompanied by 2–3 hypothesis cards the operator can triage between.
# Each hypothesis is a small, transparent scorer: def hypothesis(row, trace_phases, alerts, drivers): score = 0.0; evidence = [] if condition_1: score += w1; evidence.append("...") if condition_2: score += w2; evidence.append("...") return { score, name, evidence, confirm_by, distinguishes_from, if_confirmed } # Run all hypotheses; threshold; rank; tier the top 3. candidates = [h for h in HYPOTHESES if h.score ≥ 0.20] candidates.sort(by score, descending) candidates[0].tier = "most_likely" ★★★ candidates[1].tier = "alternative" ★★ candidates[2].tier = "lower_probability" ★ # Library of 7 hypotheses (each ~10 lines): NIGHT-shift contamination event # shift + alert alignment yeast lot lag-phase failure # bad lot + low peak/μ + viability sterilization breach (SIP/CIP) # RQ shift + utility stress oxygen-transfer-limited (kLa) # low DO + RQ shift substrate uncoupling # RQ regime change without contam equipment fatigue # multiple alerts in one vessel trace deficiency (biotin/Fe) # slow growth without viability drop
Scores are not calibrated probabilities — that would require a labeled outcome dataset (confirmed root cause per past contamination event). We don't have one. They're heuristic rankings: which candidate has the most supporting evidence right now. The operator interprets the rank with judgment; the system surfaces what to look at first.
Traditionally the data lake is the brain and queries are nerves: every read recomputes from raw rows. The body works the other way around — your liver doesn't get queried, it knows. FERM_RUN_STATE mirrors that. Per-run intelligence (regime, alerts, drift, narrative) lives in a small hot-tier table that the dashboard reads in O(1). The data lake (FERM_FULL_V, snapshots, audit) is the historian — consulted for population baselines and trend analysis, not for "how is run X right now."
# Read path: (operator-facing) # GET /api/run_state?run_id=X # ↓ # try: row = SELECT * FROM ferm_run_state_v WHERE run_id = :X # if row.age_seconds < max_age: return row (HIT, < 50 ms) # else: # fresh = _compute_run_state(X) (MISS, ~3 s) # POST /ords/.../ferm_state_upsert/ (MERGE) # return fresh # Persistence layer (single MERGE-based upsert proc): PROCEDURE ferm_upsert_run_state(p_run_id, p_regime, ..., p_drift_score, ...) MERGE INTO ferm_run_state ON (run_id) WHEN MATCHED UPDATE / NOT MATCHED INSERT
Refresh model is eventual consistency with operator-controlled staleness. Default TTL is 5 minutes — appropriate when the live injector adds rows every 30 seconds but operators don't make sub-minute decisions. The freshness is exposed in the UI ("cache hit · 90 s old" or "fresh compute · persisted") so operators always know how stale the answer they're looking at is. Optional pre-warming via a DBMS_SCHEDULER job (commented in data/08_run_state.sql) keeps the latest LIVE_ row hot even when nobody's watching.
Lens 7 fires when a rule is tripped (fever already broken). The drift score is the rising cortisol BEFORE the fever — a continuous 0–1 metric that climbs smoothly as the run departs from a healthy reference trajectory:
# For each tracked agent (OUR, CER, VCD, pHi, viability): healthy_trace = simulate_run_trace(state.with(contamination=False)) diff[t] = (actual_trace[t] − healthy_trace[t]) / agent_noise_scale rms_agent = sqrt(mean(diff[t]² over the run)) drift_score = 1 − exp(−mean(rms_agent across agents) / 1.5) # 0 = exactly on healthy trajectory # 0.30 = mild deviation, watch # 0.60 = significant deviation, often crossed before any rule trips # 0.85 = severe (contamination in full effect) drift_trend = compare RMS over last 24h vs prior 24h: recent / prior > 1.30 → "rising" recent / prior < 0.70 → "falling" otherwise → "steady"
The trace orchestrator (Lens 6) per-run produces ~7 time-series. extract_phase_features(trace_result) collapses each into a small set of per-batch scalars — phase-specific means, late-window minimums, terminal values, max negative slopes — and the watchlist's elastic-net consumes them alongside its other predictors:
our_mean_exp # OUR averaged across exp phase our_mean_stat # OUR averaged across stationary phase our_max_drop_per_h # sharpest negative slope of OUR (6h window) cer_mean_stat # CER averaged across stationary vcd_terminal # viable cell density in last 12 h pHi_min_late # minimum intracellular pH in last 24 h pHi_mean_late # mean intracellular pH in last 24 h viab_terminal # end-of-batch viability viab_max_drop_per_h # sharpest negative slope of viability (4h window) rq_mean_stat # RQ averaged across stationary
Phase features carry information that per-batch summaries lose. "Average OUR over the run" is dominated by the long stationary plateau; "OUR mean during exp phase" or "max drop per hour" are far more diagnostic of where the run went wrong. The watchlist's elastic-net surfaces whichever of these features turn out to be stable predictors.
Rule-based, transparent, fast. Each detector consumes the live trace for one run and either fires once at the earliest qualifying timepoint or stays silent. The point of "earliest qualifying" is to maximize hours-before-harvest — the operator's saved time if they'd acted at trigger.
OUR sharp drop severity high # fall > 25% in 4h window when above 5 mmol/L/h intracellular pH stress severity high # pHi < 6.7 sustained ≥ 3h viability collapse severity critical # 90%+ → < 80% in 2h window RQ regime shift severity medium # RQ outside [0.85, 1.20] for ≥ 3h, post-lag only
Thresholds in this PR are demo-realistic (would come from validated SOPs in production). Severity maps to action via the decision card: critical → abort, high → hold + verify, medium → verify, info → continue.
The decision card is the synthesizer — it doesn't introduce new statistics, it reads what the other lenses already produced and combines them into one operator-facing call. Implementation is rule-based and traces back to numbers visible elsewhere on the page, so any decision is auditable line-by-line.
# Regime selection — alert-driven; drivers are informational. # Population-level drivers describe the risk landscape but don't say # whether THIS batch is in trouble; alerts and run state do. if any alert.severity == "critical": regime = "intervene", action = "abort" elif any alert.severity == "high": regime = "intervene", action = "hold" elif any alert.severity == "medium" OR run.contam == "Y": regime = "watch", action = "verify" else: regime = "nominal", action = "continue" # Drivers selected (de-duped, capped at 6): Lens 2 top_factors with significant FDR-adjusted lift ≥ 5pp Lens 4 biomass adj_or_per_sd with |OR − 1| ≥ 0.20 and significant Lens 5 watchlist with stability ≥ 0.80 and |effect| ≥ 0.30 # Confidence: 0.5 · mean(top-5 watchlist stability) + 0.5 · min(1, n_total/5000)
Per-batch summaries lose information that the actual trajectory carries — when did the deviation start, how fast did it propagate, did multiple sensors agree on the timing. Each trace-capable agent (OUR, CER, viable cell density, intracellular pH, viability, kLa) implements a produce_trace(state, rng, dt_min) method that returns a deterministic-by-run_id time-series:
phases = compute_phases(state) # lag_end_h, exp_end_h, contam_onset_h for t = 0, dt_min, 2·dt_min, ..., duration_h: f = phase_factors(t, phases) # biomass_x, metab_q, decay, phase # biomass_x ramps 0.05 → 1.0 across lag→exp; slow 5% decline in stationary # metab_q ramps 0.30 → 1.0; 10% decline in stationary (substrate depletion) # decay 1.0 if no contamination yet, else exp(−(t − onset)/5h) value(t) = mechanistic_model(state, f) + sensor_noise # Derived: RQ trace = CER[t] / OUR[t] pointwise (when OUR > 0.5)
Phase boundaries are the same for every agent on the same run — that's what lets the dashboard align the dashed rules across all the metric tabs. Contamination onset is sampled deterministically from SHA-1(run_id) ⊕ 0x1ABCDEF, so the same run yields the same onset on every request — important for audit, and means the chart doesn't visually "jump around" when re-rendered.
produce_trace() is replaced by a thin SCADA bridge that reads from the historian. The phase model and downstream rendering don't change.Same rate/lift/CI/χ² formulas with predicate = p1 ∧ p2 ∧ p3. # Today only one compound is computed. Architecture supports more; # the UI reads d.explanation.compound (singular) for now.
Each finding gets a neutral magnitude tag based on FDR-adjusted significance and absolute lift. No prior expectation is tested; no belief is confirmed or disproven. The tag exists so the eye lands on the bigger effects first when scanning the panel.
strongmoderateweak
weak_unconfirmednegligible
Any factor or assumption with n < 150 renders dimmed with a low n badge. Rows with p_adjusted > 0.05 render dimmed with an n.s. badge. Numbers still show; only the visual weight drops, so nothing is hidden.
A DBMS_SCHEDULER job (FERM_SNAPSHOT_CAPTURE) fires every 5 min and writes one row to ferm_snapshots with the current baseline, factor rates, and compound rate. The timeline chart reads the pre-aggregated series — it's not re-computed per request.
The /api/narrate endpoint sends the already-computed numbers to Gemma with an instruction to restate them in plain language. The LLM never computes new numbers. It's a translator, not a model. If narration fails or times out, the UI falls back to a structured auto-summary built from the same data.
Real fermentation has hundreds-to-thousands of parameters across timescales and subsystems — in-tank sensors, ambient conditions, raw-material lots, equipment integrity, cellular metabolism, off-gas mass balance, personnel. The dashboard tracks a handful today and the rest are on a roadmap. FERM_PARAMETER_CATALOG is the explicit registry: one row per parameter with its category, units, role in the analysis (predictor / outcome / outcome-proxy), regulatory class (CPP / CQA / CMA), expected range, source location, and tracked-vs-roadmap status.
Why this matters in practice: it's the difference between "the system knows about X" and "the system happens to read X from a column somewhere." Future versions of _build_adjusted_ors will read predictor membership directly from the catalog instead of from hardcoded Python lists. For audit, the catalog answers "what does the dashboard claim to know, and how is each value being collected?" in a single SELECT.
For roadmap parameters that aren't yet instrumented (off-gas mass-balance OUR/CER/RQ, capacitance-probe viable cell density, intracellular pH, kLa, equipment-age counters, raw-material lot attributes), we generate values from sensor agents instead of waiting on hardware. Each agent in api/sensor_agents.py is a small autonomous component:
class OURAgent(SensorAgent):
name = "OURAgent"
param_code = "OUR_MMOL_L_H"
def measure(self, state, rng):
# Physics: OUR ≈ q_O2 · X (specific O2 uptake × biomass)
x_g_l = state.peak_od_true * 0.35
q_o2 = 4.5 * (0.62 if state.contamination else 1.0)
drift = max(0, state.days_since_ph_cal - 21) * 0.04
v = q_o2 * x_g_l * (1 - drift) + rng.gauss(0, 1.5)
return {"value": round(max(0, v), 2), "flag": "ok"}
def interpret(self, value, state):
# Plain-language status the agent attaches to its own reading
if value < 8: return f"OUR {value} — stalled metabolism"
if value < 25: return f"OUR {value} — healthy aerobic range"
return f"OUR {value} — vigorous respiration"
Three properties make these "agents" rather than just functions: each owns its own state (calibration drift, fouling, span shift), each carries a mechanistic model (van't Riet for kLa, off-gas mass balance for OUR/CER, fluorescence calibration for intracellular pH), and each produces an opinion about its current reading via interpret(). When real probes come online, the agent's measure() is replaced with a thin SCADA bridge — every other layer of the pipeline (catalog, regression, UI) is unchanged. The orchestrator simulate_run(state) fires every agent in sequence, then the derived agents (RQ = CER/OUR) run on the others' outputs. The /api/simulate?n=10&seed=42 endpoint exposes this for verification — paste it in a tab to inspect the readings.
Two source tables (in-tank sensors + around-the-tank environmental context), one pre-aggregated time-series, and one audit log.
FERM_RUNS — what the in-tank sensor sees run_id PK · avg_temp · avg_ph · min_do_pct · inoculum_size_ml max_rpm · lactose_feed_ml · duration_days · media_volume_ml max_fpu_ml (enzyme yield) peak_od · time_to_peak_h · mu_h1 (biomass kinetics) contamination Y/N · yield_category │ │ run_id (1:1) ▼ FERM_ENV — the "moat" layer: what the sensor can't see run_id PK · run_start_ts · tank_id · shift · crew_id ambient_temp_c · ambient_humidity_pct · barometric_hpa water_hardness_ppm · chlorine_ppm · yeast_lot · media_lot neighbor_contaminated Y/N FERM_SNAPSHOTS — dashboard time-series, captured every 5 min id PK · captured_at · total_runs contam_pct_all · contam_pct_window (last 200) night_n, night_contam_pct bad_yeast_n, bad_yeast_contam_pct humid_n, humid_contam_pct compound_n, compound_rate FERM_AUDIT — viewer attribution: every page load, question, explain id PK · ts · viewer · endpoint · question · answer_summary FERM_PARAMETER_CATALOG — Tier 4 registry of every parameter the system tracks or could track param_code PK · display_name · category · subsystem · data_type unit · expected_min/max · phase_relevance · sampling source_table · source_column · calibration_source · regulatory_class is_predictor · is_outcome · is_outcome_proxy · tracked Y/N simulated_by (SensorAgent class name; NULL = not simulated) · notes FERM_BASELINE_STRATA — Tier 8 materialized view · contextual rates per cell (optional) shift · yeast_band · humidity_band · hardness_band (stratum_key) n_runs · n_contam · n_high_yield · contam_pct · high_yield_pct contam_ci_low / contam_ci_high (95% Wilson, computed in view) last_refreshed (refreshed nightly via DBMS_SCHEDULER) FERM_OUTCOME_LABEL — Tier 11 · operator-recorded confirmed root causes (the loop closer) run_id PK · confirmed_cause · confidence (definite/likely/uncertain) operator_notes · labeled_by · labeled_at ▲ │ POST /api/label_outcome → ferm_outcome_upsert/ ORDS module (MERGE) FERM_RUN_STATE — hot tier · per-run cached intelligence (the inversion) run_id PK · last_updated_at regime · regime_label · recommended_action · rationale · confidence drift_score · drift_trend · drift_components_json n_active_alerts · earliest_alert_severity/t_h/detector alerts_json · drivers_json · narrative · n_similar_batches ▲ │ MERGE-based upsert via ferm_state_upsert/ ORDS module │ ── /api/run_state recomputes + persists when stale (TTL = 5 min)
v_ferm_full = ferm_runs ⋈ ferm_env ON run_id, and a pre-aggregated summary view (05_summary_view_v2.sql) that /api/summary reads from.