Tracking multiple objects in thermal imagery is important for surveillance and perception under low-visibility conditions, but thermal images pose specific difficulties: objects lack distinctive appearance features, contrast is low, and occlusions are common.
We propose a three-stage framework for pedestrian MOT on the TP-MOT benchmark, combining thermal-adapted detection, online tracking, and an offline tracklet stitching module. The core of our approach is a lightweight post-processing step that reconnects fragmented trajectories by exploiting temporal gaps, spatial proximity, motion consistency, and boundary-aware constraints. This keeps the online tracker simple—no re-identification network or global optimization is needed—while substantially improving identity preservation.
On the PBVS Thermal MOT benchmark, the proposed stitching stage consistently reduces fragmentation and improves long-term identity continuity, offering a practical real-time solution for thermal pedestrian surveillance.
Thermal image sequences
YOLOv8s pedestrian detector
(1920px, conf = 10⁻⁴, NMS = 0.75)
SORT: Kalman filter +
Hungarian matching
Online short-gap recovery +
Offline tracklet stitching
Identity-consistent trajectories
Figure 1. Proposed thermal pedestrian MOT pipeline. Stage 4 (highlighted) is our main contribution.
Given tracklets a (old) and b (new), we extrapolate a's endpoint via a 3-point velocity estimate and merge only when all four criteria hold:
Both endpoints outside 60-px border band
1 ≤ Δt ≤ 30 frames
‖c̃a − cstartb‖₂ ≤ 80 px
Angle ≤ 45°, speed ratio ≤ 3.0
| # | Participant | MOTA ↑ | MOTP ↓ | IDF1 ↑ | IDP | IDR | Recall | Precision |
|---|---|---|---|---|---|---|---|---|
| 1 | Ours (UW_IPL) | 0.99 | 0.13 | 0.86 | 0.86 | 0.86 | 1.00 | 1.00 |
| 2 | wqetwet | 0.97 | 0.12 | 0.97 | 0.98 | 0.96 | 0.98 | 1.00 |
| 3 | SKKU-AutoLab | 0.97 | 0.14 | 0.94 | 0.95 | 0.93 | 0.98 | 1.00 |
| 4 | wwww123 | 0.85 | 0.12 | 0.86 | 0.91 | 0.82 | 0.89 | 0.98 |
| 5 | spcke | 0.82 | 0.14 | 0.86 | 0.90 | 0.83 | 0.88 | 0.94 |
| Method | MOTA ↑ | MOTP ↓ | IDF1 ↑ |
|---|---|---|---|
| ByteTrack | 0.9173 | 0.1367 | 0.7659 |
| BoT-SORT | 0.9174 | 0.1368 | 0.7605 |
| BoostTrack | 0.8654 | 0.1555 | 0.7545 |
| DiffMOT | 0.8630 | 0.1611 | 0.7812 |
| OC-SORT | 0.9071 | 0.1236 | 0.5685 |
| SAM3 | 0.9003 | 0.2104 | 0.6073 |
| SORT (baseline) | 0.9844 | 0.1263 | 0.8130 |
| SORT + Stitching (Ours) | 0.9853 ▲ | 0.1262 | 0.8545 ▲ |
Key finding: Tracklet stitching improves IDF1 by +4.15%p over vanilla SORT. Our SORT-based pipeline outperforms all tested complex trackers across MOTA, MOTP, and IDF1.
Detection → Tracking → Repair, each swappable. Easy to benchmark different tracker variants.
Online short-gap recovery + offline stitching together push IDF1 by +4.15%p.
No Re-ID module. SORT costs 0.25 MB—96,000× smaller than SAM3.