PBVS 2026 TP-MOT Challenge — Top-3 Finalist

Robust Thermal Pedestrian Multi-Object Tracking
through Three-Stage Trajectory Refinement

Department of Electrical & Computer Engineering, University of Washington
CVPR 2026 · Perception Beyond the Visible Spectrum (PBVS) Workshop
Advisor: Prof. Jenq-Neng Hwang
TL;DR YOLOv8 + tuned SORT + tracklet stitching → MOTA 0.99 & IDF1 0.86, no Re-ID needed

Abstract


Tracking multiple objects in thermal imagery is important for surveillance and perception under low-visibility conditions, but thermal images pose specific difficulties: objects lack distinctive appearance features, contrast is low, and occlusions are common.

We propose a three-stage framework for pedestrian MOT on the TP-MOT benchmark, combining thermal-adapted detection, online tracking, and an offline tracklet stitching module. The core of our approach is a lightweight post-processing step that reconnects fragmented trajectories by exploiting temporal gaps, spatial proximity, motion consistency, and boundary-aware constraints. This keeps the online tracker simple—no re-identification network or global optimization is needed—while substantially improving identity preservation.

On the PBVS Thermal MOT benchmark, the proposed stitching stage consistently reduces fragmentation and improves long-term identity continuity, offering a practical real-time solution for thermal pedestrian surveillance.

Challenges & Our Approach


 Thermal MOT Challenges

  • Heat-only sensing—no color or texture
  • Low contrast, frequent occlusion
  • Abrupt motion & scale changes

 Our Key Insight

  • Tune a simple pipeline, don't stack modules
  • SORT tracker + offline tracklet stitching
  • Short-gap repair + motion-aware merging
  • No Re-ID—96,000× less memory than SAM3

Framework Overview


1

Input

Thermal image sequences

2

Detection

YOLOv8s pedestrian detector
(1920px, conf = 10⁻⁴, NMS = 0.75)

3

Online Tracking

SORT: Kalman filter +
Hungarian matching

4

Identity Repair

Online short-gap recovery +
Offline tracklet stitching

5

Output

Identity-consistent trajectories

Figure 1. Proposed thermal pedestrian MOT pipeline. Stage 4 (highlighted) is our main contribution.

Offline Motion-Consistent Tracklet Stitching

Given tracklets a (old) and b (new), we extrapolate a's endpoint via a 3-point velocity estimate and merge only when all four criteria hold:

a = cenda + va · Δt  where  Δt = tstartb − tenda
1
Border-Aware

Both endpoints outside 60-px border band

2
Temporal Gap

1 ≤ Δt ≤ 30 frames

3
Spatial Proximity

‖c̃a − cstartb‖₂ ≤ 80 px

4
Motion Consistency

Angle ≤ 45°, speed ratio ≤ 3.0

Experimental Results


Table 1 — PBVS TP-MOT Challenge Leaderboard (Evaluation Server)
# Participant MOTA ↑ MOTP ↓ IDF1 ↑ IDP IDR Recall Precision
1 Ours (UW_IPL) 0.99 0.13 0.86 0.86 0.86 1.00 1.00
2 wqetwet 0.97 0.12 0.97 0.98 0.96 0.98 1.00
3 SKKU-AutoLab 0.97 0.14 0.94 0.95 0.93 0.98 1.00
4 wwww123 0.85 0.12 0.86 0.91 0.82 0.89 0.98
5 spcke 0.82 0.14 0.86 0.90 0.83 0.88 0.94
Table 2 — Tracker Comparison (Same YOLOv8 Detections, Evaluation Server)
Method MOTA ↑ MOTP ↓ IDF1 ↑
ByteTrack 0.9173 0.1367 0.7659
BoT-SORT 0.9174 0.1368 0.7605
BoostTrack 0.8654 0.1555 0.7545
DiffMOT 0.8630 0.1611 0.7812
OC-SORT 0.9071 0.1236 0.5685
SAM3 0.9003 0.2104 0.6073
SORT (baseline) 0.9844 0.1263 0.8130
SORT + Stitching (Ours) 0.9853 0.1262 0.8545

Key finding: Tracklet stitching improves IDF1 by +4.15%p over vanilla SORT. Our SORT-based pipeline outperforms all tested complex trackers across MOTA, MOTP, and IDF1.

Contributions


Modular Pipeline

Detection → Tracking → Repair, each swappable. Easy to benchmark different tracker variants.

Identity-Repair Module

Online short-gap recovery + offline stitching together push IDF1 by +4.15%p.

Lightweight & Efficient

No Re-ID module. SORT costs 0.25 MB—96,000× smaller than SAM3.