Benchmark a pipeline against ground truth#

The point of a forward simulation is that you know the answer. This recipe runs your analysis pipeline on a simulated movie and scores what it recovered against the ground truth, using the recovery metrics.

1. Simulate, then run your pipeline#

from minisim import simulate

rec = simulate(spec)
movie = rec.observed          # (frame, height, width) sensor counts
gt = rec.ground_truth

# Your analysis pipeline (minian, CaImAn, suite2p, ...) returns:
#   A_est: (n_units, height, width) spatial footprints
#   C_est: (n_units, frame)         calcium traces
#   S_est: (n_units, frame)         deconvolved spikes
A_est, C_est, S_est = run_my_pipeline(movie)

2. Match estimated cells to true cells#

Recovery is only meaningful once you know which estimated cell corresponds to which true cell. hungarian_match() solves the optimal one-to-one assignment by spatial overlap (IoU). Match against A_observed (what is actually recoverable through the optics), not A_planted (the optics-free ideal).

from minisim import hungarian_match

match = hungarian_match(A_est, gt.A_observed)

match.recall()       # fraction of true cells recovered (IoU >= 0.5)
match.precision()    # fraction of estimated cells that are real
match.mean_iou       # mean spatial overlap over matched pairs

Recall should be scored against the detectable cells, not every planted cell: a cell too deep or too dim to appear in the movie is not a fair miss. Use detectable_subset() for the honest denominator:

det = gt.detectable_subset()
match = hungarian_match(A_est, det.A_observed)
print(f"recall over detectable cells: {match.recall():.2f}")

3. Score the recovered traces and spikes#

match.pairing is the list of (est_idx, true_idx) pairs; feed it straight to the temporal metrics.

import numpy as np
from minisim import trace_pearson, spike_precision_recall

r = trace_pearson(C_est, gt.C, match.pairing)   # one Pearson r per matched pair
print(f"median trace correlation: {float(np.nanmedian(r)):.2f}")

spikes = spike_precision_recall(S_est, gt.S, match.pairing, tol_frames=2)
print(f"spike precision={spikes.precision:.2f}, recall={spikes.recall:.2f}")

4. Score motion recovery (optional)#

If your pipeline estimates a rigid-motion trajectory and the spec has a BrainMotion step, compare against gt.shifts with shift_rmse():

from minisim import shift_rmse

rmse_px = shift_rmse(shifts_est, gt.shifts)   # per-frame (dy, dx), in pixels

Scaling up#

To trace a metric across a physical axis (recall vs depth, vs NA, vs density), drive this same scoring from a parameter sweep and collect the results into a DataFrame.