Robust Thermal Pedestrian Multi-Object Tracking through Three-Stage Trajectory Refinement

Sun, Winston; Ko, Gyungmin; Kwon, Heejae; Hsu, Celine; Huang, Hsiang-wei

PBVS 2026 TP-MOT Challenge — Top-3 Finalist

Robust Thermal Pedestrian Multi-Object Tracking
through Three-Stage Trajectory Refinement

Winston Sun Gyungmin Ko Heejae Kwon Celine Hsu Hsiang-wei Huang

Department of Electrical & Computer Engineering, University of Washington

CVPR 2026 · Perception Beyond the Visible Spectrum (PBVS) Workshop

Advisor: Prof. Jenq-Neng Hwang

Paper Code Poster

Abstract

Tracking multiple objects in thermal imagery is important for surveillance and perception under low-visibility conditions, but thermal images pose specific difficulties: objects lack distinctive appearance features, contrast is low, and occlusions are common.

We propose a three-stage framework for pedestrian MOT on the TP-MOT benchmark, combining thermal-adapted detection, online tracking, and an offline tracklet stitching module. The core of our approach is a lightweight post-processing step that reconnects fragmented trajectories by exploiting temporal gaps, spatial proximity, motion consistency, and boundary-aware constraints. This keeps the online tracker simple—no re-identification network or global optimization is needed—while substantially improving identity preservation.

On the PBVS Thermal MOT benchmark, the proposed stitching stage consistently reduces fragmentation and improves long-term identity continuity, offering a practical real-time solution for thermal pedestrian surveillance.

Challenges & Our Approach

Thermal MOT Challenges

Heat-only sensing—no color or texture
Low contrast, frequent occlusion
Abrupt motion & scale changes

Our Key Insight

Tune a simple pipeline, don't stack modules
SORT tracker + offline tracklet stitching
Short-gap repair + motion-aware merging
No Re-ID—96,000× less memory than SAM3

Framework Overview

Input

Thermal image sequences

Detection

YOLOv8s pedestrian detector
(1920px, conf = 10⁻⁴, NMS = 0.75)

Online Tracking

SORT: Kalman filter +
Hungarian matching

Identity Repair

Online short-gap recovery +
Offline tracklet stitching

Output

Identity-consistent trajectories

Figure 1. Proposed thermal pedestrian MOT pipeline. Stage 4 (highlighted) is our main contribution.

Offline Motion-Consistent Tracklet Stitching

Given tracklets a (old) and b (new), we extrapolate a's endpoint via a 3-point velocity estimate and merge only when all four criteria hold:

c̃_a = c^end_a + v_a · Δt where Δt = t^start_b − t^end_a

Border-Aware

Both endpoints outside 60-px border band

Temporal Gap

1 ≤ Δt ≤ 30 frames

Spatial Proximity

‖c̃_a − c^start_b‖₂ ≤ 80 px

Motion Consistency

Angle ≤ 45°, speed ratio ≤ 3.0

Experimental Results

Table 1 — PBVS TP-MOT Challenge Leaderboard (Evaluation Server)

#	Participant	MOTA ↑	MOTP ↓	IDF1 ↑	IDP	IDR	Recall	Precision
1	Ours (UW_IPL)	0.99	0.13	0.86	0.86	0.86	1.00	1.00
2	wqetwet	0.97	0.12	0.97	0.98	0.96	0.98	1.00
3	SKKU-AutoLab	0.97	0.14	0.94	0.95	0.93	0.98	1.00
4	wwww123	0.85	0.12	0.86	0.91	0.82	0.89	0.98
5	spcke	0.82	0.14	0.86	0.90	0.83	0.88	0.94

Table 2 — Tracker Comparison (Same YOLOv8 Detections, Evaluation Server)

Method	MOTA ↑	MOTP ↓	IDF1 ↑
ByteTrack	0.9173	0.1367	0.7659
BoT-SORT	0.9174	0.1368	0.7605
BoostTrack	0.8654	0.1555	0.7545
DiffMOT	0.8630	0.1611	0.7812
OC-SORT	0.9071	0.1236	0.5685
SAM3	0.9003	0.2104	0.6073
SORT (baseline)	0.9844	0.1263	0.8130
SORT + Stitching (Ours)	0.9853 ▲	0.1262	0.8545 ▲

Key finding: Tracklet stitching improves IDF1 by +4.15%p over vanilla SORT. Our SORT-based pipeline outperforms all tested complex trackers across MOTA, MOTP, and IDF1.

Contributions

Modular Pipeline

Detection → Tracking → Repair, each swappable. Easy to benchmark different tracker variants.

Identity-Repair Module

Online short-gap recovery + offline stitching together push IDF1 by +4.15%p.

Lightweight & Efficient

No Re-ID module. SORT costs 0.25 MB—96,000× smaller than SAM3.