Simple Online and Realtime Tracking with a Deep Association Metric
๐Ÿ“ท

Simple Online and Realtime Tracking with a Deep Association Metric

Created
Jun 25, 2022
Editor
Tags
Vision
cleanUrl: "paper/DeepSORT"
Video preview
๋…ผ๋ฌธ๋ช… : SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC ์ €์ž : Nicolai Wojkeโ€ , Alex Bewley, Dietrich Paulusโ€ 
ย 

Abstract

SORT(Simple Online and Realtime Tracking)๋Š” ๋‹จ์ˆœํ•˜๋ฉด์„œ๋„ ํšจ์œจ์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋‹ค์ค‘ ๊ฐ์ฒด๋ฅผ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” SORT์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ ์™ธํ˜• ์ •๋ณด(Appearance Information)๋ฅผ ํ†ตํ•ฉ์‹œํ‚ต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์˜ ๊ฐœ์„ ์€ ๊ฐ์ฒด๊ฐ€ ์žฅ์• ๋ฌผ์— ๊ฐ€๋ ค์ง€๋Š” ์‹œ๊ฐ„์ด ๊ธธ์–ด์ ธ๋„ ๊ฐ์ฒด๋ฅผ ์ถ”์ ํ•  ์ˆ˜ ์žˆ๊ณ , ํšจ๊ณผ์ ์œผ๋กœ identity switch๋ฅผ ์ค„์—ฌ๋‚˜๊ฐˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ธฐ์กด ๋ชจ๋ธ์˜ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ computational complexity๋ฅผ offline ์‚ฌ์ „ ํ•™์Šต๋‹จ๊ณ„์— ๋ฐฐ์น˜ํ•˜์—ฌ, ๋Œ€์šฉ๋Ÿ‰์˜ ์‚ฌ๋žŒ ์žฌ์‹๋ณ„ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ deep association metric์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
online์„ ์ ์šฉํ•˜๋Š” ๋™์•ˆ์—, ์‹œ๊ฐ์  ํ˜•์ƒ์ด ์žˆ๋Š” ๊ณต๊ฐ„์—์„œ NN(Nearest Neighbor,๊ทผ์ ‘ ์ด์›ƒ) query๋ฅผ ์ด์šฉํ•˜์—ฌ measurement-to-track association์„ ์„ธ์› ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ํ‰๊ฐ€์—์„œ๋Š” ์šฐ๋ฆฌ์˜ ํ™•์žฅ์ด identity switch๋ฅผ 45%๋ฅผ ๊ฐ์†Œ์‹œ์ผฐ๊ณ , ๋†’์€ ํ”„๋ ˆ์ž„๋ฅ ๋กœ ์ „๋ฐ˜์ ์ธ ๊ฒฝ์Ÿ์  ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
identity switch :
๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋“ค์ด ์›€์ง์ผ ๋•Œ, ์„œ๋กœ์˜ ID ์ถ”์ ์ด ๋ณ€๊ฒฝ๋˜๋Š” ์ผ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
notion image
deep association metric :
association metric์— deep learning(CNN)์„ ์ ์šฉํ•œ ๊ฒƒ์œผ๋กœ, ์ž์„ธํ•œ ์„ค๋ช…์€ 2.4์ ˆ์— ๋‚˜์™€์žˆ์Šต๋‹ˆ๋‹ค.

0. FlowChart

SORT

SORT

SORT๋ž€?

  • ์‹ค์‹œ๊ฐ„ ์ถ”์ ์„ ์œ„ํ•ด Object๋“ค์„ ํšจ์œจ์ ์œผ๋กœ ์—ฐ๊ด€(Associate)์ง€์–ด์ฃผ๋Š” MOT(Multi Object Tracking) ์ž…๋‹ˆ๋‹ค.
    • MOT๋ž€?
      • A. ๋‹ค์ˆ˜์˜ ๊ฐ์ฒด๋“ค ์ถ”์ ์„ ์œ„ํ•ด ํƒ์ง€๋œ ๊ฐ์ฒด๋“ค(Detected Objects) ๊ฐ„ ์—ฐ๊ด€์ง€์–ด์ฃผ๋Š” ๊ณผ์ •์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.
ย 
  • ์ •ํ™•ํžˆ SORT ์•ˆ์—์„œ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”๊ฑด๊ฐ€์š”?
      1. Detection : ํ”„๋ ˆ์ž„์—์„œ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.
      1. Estimation : Kalman Filter๋ฅผ ํ†ตํ•ด ์ถ”์ ์„ ์œ„ํ•œ ์ธก์ •์น˜ ์˜ˆ์ธก, ์—…๋ฐ์ดํŠธ ๊ณผ์ •์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
      1. Data Association :
        1. IoU ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
        2. ์ถ”์ ๋˜๊ณ  ์žˆ๋˜ ๊ฐ์ฒด์™€
          1. ์ถ”์ ๋˜์ง€ ์•Š๋Š” ๊ฐ์ฒด(์‚ฌ๋ผ์ง„ ๊ฐ์ฒด, New ๊ฐ์ฒด)๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
            b-1. ์ถ”์ ๋˜๊ณ  ์žˆ๋˜ ๊ฐ์ฒด๋Š” ๋‹ค์‹œ Kalman Filter๋ฅผ ํ†ตํ•ด ๋‹ค์Œ ์˜ˆ์ธก ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
            b-2. ์‚ฌ๋ผ์ง„ ๊ฐ์ฒด(Unmatched Tracks)๋Š” ์ผ์ • ์‹œ๊ฐ„ ์ดํ›„ ์‚ญ์ œ๋˜๋ฉฐ, New ๊ฐ์ฒด(Unmatched Detections)๋Š” ์ƒˆ๋กญ๊ฒŒ Track ์ƒ์„ฑ ํ›„ Track์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.
ย 
  • Limitations of SORT
    • Occlusion(ํ์ƒ‰, ๊ฐ€๋ ค์ง) ๋ฌธ์ œ์— ์ทจ์•ฝํ•ฉ๋‹ˆ๋‹ค.
      • ์ถœ์ฒ˜ : Measurement-wise Occlusion in Multi-object Tracking
        ์ถœ์ฒ˜ : Measurement-wise Occlusion in Multi-object Tracking
    • ID Switching : ๋‹ค์–‘ํ•œ ๊ฐ์ฒด๋“ค์ด ์›€์ง์ผ ๋•Œ, ์„œ๋กœ์˜ ID ์ถ”์ ์ด ๋ณ€๊ฒฝ๋˜๋Š” ID Switching์— ์ทจ์•ฝํ•ฉ๋‹ˆ๋‹ค.
      • notion image
      ย 
notion image

DeepSORT

notion image

1. Introduction

๊ฐ์ฒด ํƒ์ง€์—์„œ ์ตœ๊ทผ์˜ ํ–‰๋ณด๋กœ ์ธํ•ด, tracking-by-detection์€ ๋‹ค์ค‘๊ฐ์ฒด์ถ”์ ์„ ์ด๋„๋Š” ํŒจ๋Ÿฌ๋‹ค์ž„์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ํŒจ๋Ÿฌ๋‹ค์ž„์—์„œ, ๊ฐ์ฒด์˜ trajectory(๊ถค๋„)๋“ค์€ ๋ณดํ†ต video batch ์ „์ฒด๋ฅผ ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋Š” global optimization ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ batch ์ฒ˜๋ฆฌ๋ฐฉ์‹ ๋•Œ๋ฌธ์— flow network formulation๊ณผ ํ™•๋ฅ ์  ๊ทธ๋ž˜ํ”„ ๋ชจ๋ธ๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์€ target์˜ identity์— ๋Œ€ํ•ด ๊ฐ time step๋งˆ๋‹ค ํ™•์ธ์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ–ˆ๊ณ , ์ด์— ๋”ฐ๋ผ online ์ƒ์—์„œ ์ ์šฉ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
SORT๋Š” ์ด๋ฏธ์ง€ ๊ณต๊ฐ„ ์ƒ์—์„œ Kalman filtering์„ ์ˆ˜ํ–‰ํ•˜๊ณ , Bounding box์˜ overlap์„ ์ธก์ •ํ•˜๋Š” association metric๊ณผ ํ•จ๊ป˜ Hungarian method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ frame-by-frame data association์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋”์šฑ ๋‹จ์ˆœํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋‹จ์ˆœํ•œ ์ ‘๊ทผ์€ ๋†’์€ ํ”„๋ ˆ์ž„๋ฅ ๋กœ ์œ ๋ฆฌํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
MOT challenge dataset์—์„œ, sota์˜ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์‚ฌ๋žŒ ๊ฒ€์ถœ๊ธฐ SORT๋Š” ํ‘œ์ค€ ๊ฒ€์ถœ๊ธฐ์˜ MHT๋ณด๋‹ค ํ‰๊ท ์ ์œผ๋กœ ๋†’์€ ์ˆœ์œ„๋ฅผ ์ฐจ์ง€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ „๋ฐ˜์ ์ธ ์ถ”์  ๊ฒฐ๊ณผ์—์„œ ๋ฌผ์ฒด ๊ฒ€์ถœ๊ธฐ ์„ฑ๋Šฅ์˜ ์˜ํ–ฅ๋ ฅ์„ ๊ฐ•์กฐํ•  ๋ฟ ์•„๋‹ˆ๋ผ, ์ „๋ฌธ๊ฐ€์˜ ์ž…์žฅ์—์„œ ๋ดค์„ ๋•Œ ๊ต‰์žฅํžˆ ์ค‘์š”ํ•œ ํ†ต์ฐฐ๋ ฅ์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
SORT๋Š” tracking precision๊ณผ accuracy ์ธก๋ฉด์—์„œ ์ „๋ฐ˜์ ์œผ๋กœ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ธด ํ•˜์ง€๋งŒ, ์ƒ๋Œ€์ ์œผ๋กœ ๋งŽ์€ identity switch๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ด๋Š” ์‚ฌ์šฉ๋œ association metric์ด state estimation ๋ถˆํ™•์‹ค์„ฑ์ด ๋‚ฎ์„ ๋•Œ์—๋งŒ ์ •ํ™•ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๊ฒฐ๋ก ์ ์œผ๋กœ, SORT๋Š” ๋ณดํ†ต ์นด๋ฉ”๋ผ๊ฐ€ ์ •๋ฉด์„ ํ–ฅํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— occlusion์„ ์ถ”์ ํ•˜๋Š”๋ฐ ๊ฒฐํ•จ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์ด์Šˆ๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด association metric์— ์›€์ง์ž„๊ณผ appearance ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ข€ ๋” ์ •๋ณด์— ์ž…๊ฐํ•œ metric์œผ๋กœ ๋Œ€์ฒดํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ๋Œ€์šฉ๋Ÿ‰์˜ ์‚ฌ๋žŒ ์žฌ์ธ์‹ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ ๋ณดํ–‰์ž๋ฅผ ๊ตฌ๋ณ„ํ•ด๋‚ด๋„๋ก ํ›ˆ๋ จ๋œ CNN๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋„คํŠธ์›Œํฌ์˜ ํ†ตํ•ฉ์„ ํ†ตํ•ด ์‹œ์Šคํ…œ์„ ์ ์šฉํ•˜๊ธฐ ์‰ฝ๊ณ , ํšจ๊ณผ์ ์ด๊ณ , online ์ƒ์— ์ ์šฉํ•˜๊ธฐ ์ข‹๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ misses๋‚˜ occlusion๋“ค์— ๋Œ€ํ•œ ๊ฒฌ๊ณ ํ•จ์„ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค. ์ฝ”๋“œ์™€ ์‚ฌ์ „ํ•™์Šต CNN ๋ชจ๋ธ์€ ๊ฐœ๋ฐœ์˜ ์šฉ์ด์„ฑ์„ ์œ„ํ•ด ๊ณต๊ฐœํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.
ย 

2. Sort With Deep Association Metric

Kalman filtering๊ณผ frame๋ณ„ ๋ฐ์ดํ„ฐ ์—ฐ๊ด€์„ฑ์„ ๊ฐ€์ง„ ์ „ํ†ต์ ์ธ ๋‹จ์ผ ๊ฐ€์„ค ์ถ”์  ๋ฐฉ๋ฒ•์„ ์ฑ„ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์„น์…˜์—์„œ ์ข€ ๋” ์ž์„ธํ•˜๊ฒŒ ์„ค๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
Kalman filter :
  • ์ด์ „ ํ”„๋ ˆ์ž„(๋˜๋Š” stage)์— ๋“ฑ์žฅํ•œ ๊ฐœ์ฒด๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ ํ”„๋ ˆ์ž„ ๊ฐœ์ฒด์˜ ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
ย 
  • ์–ด๋–ป๊ฒŒ Kalman Filter๋ฅผ ์‚ฌ์šฉํ• ๊นŒ์š”?
    • ๐Ÿ’ก
      ์˜ˆ์ธก๊ณผ ์˜ˆ์ธก๊ฐ’์˜ ํ™•๋ฅ  ๋ถ„ํฌ + ์„ผ์„œ๊ฐ’๊ณผ ์„ผ์„œ๊ฐ’์˜ ํ™•๋ฅ  ๋ถ„ํฌ
      โ‡’ ํ™•๋ฅ  ๋ถ„ํฌ์˜ ๊ต์ง‘ํ•ฉ์„ ํ†ตํ•œ ์ตœ์  ์ถ”์ •๊ฐ’
    • ์˜ˆ์ธก์€ ์–ด๋–ป๊ฒŒ??
    • notion image
      notion image
      โ†’ ํ˜„์žฌ์˜ ์œ„์น˜,์†๋„,๊ฐ€์†๋„๋ฅผ ํ†ตํ•ด 1์ดˆ ํ›„์˜ ์œ„์น˜,์†๋„,๊ฐ€์†๋„๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ์Œ
    • ์ด์ „ ํ”„๋ ˆ์ž„์˜ ์˜ˆ์ธก๊ฐ’์˜ ๋ถ„ํฌ(Predicted state estimate) + ํ˜„์žฌ ํ”„๋ ˆ์ž„์˜ ์ธก์ •๊ฐ’ ๋ถ„ํฌ(Measurement) โ‡’ ์ตœ์  ์ถ”์ • ๊ฐ’(Optimal State Estimate)
      • notion image
        โ†’ ์˜ˆ์ธก๊ฐ’๊ณผ ์ธก์ •๊ฐ’์˜ ๊ฐ๊ฐ์˜ Gaussian Distribution์„ ์ด์šฉํ•ด ์ƒํƒœ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•ด ์ตœ์ ์˜ ์ถ”์ •๊ฐ’์„ ์–ป์Šต๋‹ˆ๋‹ค.
ย 
  • ์šฐ๋ฆฌ๋Š” ์™œ DeepSORT์— KalmanFilter๋ฅผ ์‚ฌ์šฉํ• ๊นŒ์š”?
    • A. Sensor(Camera)๋กœ ๋ฐ›์•„๋“ค์ด๋Š” Measurement ๊ฐ’ ์—ญ์‹œ Noise๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Kalman Filter๋กœ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ํšจ๊ณผ์ ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
      ๋˜ํ•œ Tracking์„ ์œ„ํ•œ ์˜์ƒ์—์„œ๋Š” ๋ฌผ์ฒด์˜ ์ด๋™์ด ์„ ํ˜•์ (๊ฐ‘์ž๊ธฐ ์‚ฌ๋ผ์ง€๊ฑฐ๋‚˜ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์Œ)์ด๊ธฐ ๋•Œ๋ฌธ์— ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ฒ˜๋ฆฌ์— ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค.
      ย 

2.1. Track Handling and State Estimation

track handling๊ณผ Kalman filtering ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๊ฑฐ์˜ ๋Œ€๋ถ€๋ถ„ SORT ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์›๋ž˜ ๊ณต์‹๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ์นด๋ฉ”๋ผ๋Š” ๋ณด์ •๋˜์ง€ ์•Š์€, ์ด์šฉํ•  ์ˆ˜ ์žˆ๋Š” ego-motion์ •๋ณด๊ฐ€ ์—†๋Š” ๋งค์šฐ ์ผ๋ฐ˜์ ์ธ tracking ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ฐ€์ •ํ•ด๋ด…์‹œ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์€ filtering ํ”„๋ ˆ์ž„์›Œํฌ์— ๋Œ€ํ•˜์—ฌ ๋„์ „์ ์ผ ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, ์ตœ๊ทผ์˜ ๋‹ค์ค‘๊ฐ์ฒด ์ถ”์  benchmark์—์„œ๋Š” ๋งค์šฐ ํ”ํ•œ ์„ค์ •์ž…๋‹ˆ๋‹ค.
ego-motion :
์นด๋ฉ”๋ผ์˜ ์ด๋™์„ ์นด๋ฉ”๋ผ์— ์˜ํ•ด ์บก์ณ๋œ ์ผ๋ จ์˜ ์ด๋ฏธ์ง€๋“ค์— ๊ธฐ์ดˆํ•˜์—ฌ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
ย 
tracking ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” bounding box์˜ ์ค‘์‹ฌ ์ขŒํ‘œ, ๊ฐ€๋กœ์„ธ๋กœ ๋น„์œจ , ๋†’์ด , ๊ทธ๋ฆฌ๊ณ  ์ด๋“ค์— ๋Œ€ํ•˜์—ฌ ์˜์ƒ ์ขŒํ‘œ๊ณ„์—์„œ์˜ ์ƒ๋Œ€์ ์ธ ์†๋„๊ฐ’์„ ํฌํ•จํ•œ 8์ฐจ์›์˜ state ๊ณต๊ฐ„์—์„œ ์ •์˜๋ฉ๋‹ˆ๋‹ค.
์šฐ๋ฆฌ๋Š” ๋“ฑ์† ์šด๋™๊ณผ ์„ ํ˜• ๊ด€์ธก ๋ชจ๋ธ์„ ๊ฐ€์ง„ ํ‘œ์ค€ Kalman filter๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ๊ฐ์ฒด์— ๋Œ€ํ•ด ์ง์ ‘์ ์œผ๋กœ ๊ด€์ธกํ•˜์—ฌ ๋‚˜ํƒ€๋‚ธ bounding box์˜ ์ขŒํ‘œ์ธ ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
๊ฐ track์ธ k์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰์œผ๋กœ ์„ฑ๊ณตํ•œ measurement association, ์ดํ›„์˜ frame ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ์นด์šดํ„ฐ๋Š” Kalman filter ์˜ˆ์ธก ๋™์•ˆ ์ฆ๊ฐ€๋˜๊ณ  track์—์„œ ๋‹ค์‹œ measurement association์ด ๋ฐœ์ƒํ•˜๋ฉด 0์œผ๋กœ ๋ฆฌ์…‹๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์ „์— ์ •์˜๋œ ์ตœ๋Œ€ age, ๋ฅผ ์ดˆ๊ณผํ•˜๋Š” track์€ scene์„ ๋– ๋‚ฌ๋‹ค๊ณ  ๊ฐ„์ฃผ๋˜์–ด track set์—์„œ ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด track ๊ฐ€์„ค์˜ ๊ฒฝ์šฐ, ๊ธฐ์กด์˜ ํŠธ๋ž™๊ณผ ์—ฐ๊ด€์ง€์–ด์งˆ ์ˆ˜ ์—†๋‹ค๊ณ  ํŒ๋‹จํ•˜๊ณ  ๊ฐ detection๋งˆ๋‹ค ์ดˆ๊ธฐํ™”๋œ ์ƒˆ๋กœ์šด track์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ์ƒˆ๋กœ์šด track๋“ค์€ ์ฒซ 3 ํ”„๋ ˆ์ž„๋™์•ˆ ์ž ์ •์  ์ƒํƒœ(tentative)๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค. ์ด ์‹œ๊ฐ„๋™์•ˆ ๊ฐ ๋‹จ๊ณ„๋งˆ๋‹ค measurement์™€์˜ ์„ฑ๊ณต์ ์ธ association์„ ๊ธฐ๋Œ€ํ•˜๋Š”๋ฐ, ์„ฑ๊ณต์ ์œผ๋กœ measurement์™€ ์—ฐ๊ฒฐ๋˜์ง€ ์•Š์€ track๋“ค์€ ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค.
ย 

2.2. Assignment Problem

์˜ˆ์ธก๋œ Kalman states์™€ ์ƒˆ๋กญ๊ฒŒ ๋„์ฐฉํ•œ measurement ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ „ํ†ต์ ์ธ ๋ฐฉ์‹์€ Hungarian algorithm์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ๊ณต์‹ ์•ˆ์—์„œ, ์šฐ๋ฆฌ๋Š” ๋‘๊ฐ€์ง€์˜ ์ ์ ˆํ•œ metrics์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์›€์ง์ž„๊ณผ ์™ธํ˜• ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค.
assignment problem :
  • ๋‹ค์ˆ˜์˜ ๊ณต๊ธ‰์ฒ˜์™€ ์ˆ˜์š”์ฒ˜๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, ์ˆ˜์šฉ๋น„์šฉ์ด ๋ชจ๋‘ ๋‹ค๋ฅผ ๋•Œ, ์ด ์ˆ˜์†ก๋น„์šฉ์˜ ํ•ฉ์ด ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ์ตœ์ ํ•ด๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.
    • ex. โ€œ๋…ธ๋™์ž(์—์–ด์ปจ ์ˆ˜๋ฆฌ๊ธฐ์‚ฌ) = ๊ณต๊ธ‰์ฒ˜ -> ์ž‘์—…(์—์–ด์ปจ ์ˆ˜๋ฆฌ) = ์ˆ˜์š”์ฒ˜ ์— ๋Œ€ํ•ด ๊ฐ€์žฅ ์ ์€ ๋น„์šฉ์˜ ํ•ด๋ฒ•โ€์„ ์–ป๋Š” ์ผ๋ จ์˜ ๊ณผ์ •์ด๋ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
  • How to Solve Assignment Problem?
    • ํ• ๋‹น ๋ฌธ์ œ์˜ ๋Œ€ํ‘œ์ ์ธ ํ•ด๊ฒฐ๋ฒ•์œผ๋กœ Hungarian Algorithm์ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • ย 
Hungarian algorithm :
notion image

Hungarian Algorithm์ด๋ž€?

  • Assignment Problem์„ ํ•ด๊ฒฐํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜
ย 
์›€์ง์ž„์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ์˜ˆ์ธก๋œ Kalman states์™€ ์ƒˆ๋กญ๊ฒŒ ๋„์ฐฉํ•œ measurement ์‚ฌ์ด์˜ (์ œ๊ณฑ) Mahalanobis ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
notion image
: j๋ฒˆ์งธ bounding box detection
: i๋ฒˆ์งธ track ๋ถ„ํฌ์—์„œ ํ‰๊ท  ๊ฐ’
: i๋ฒˆ์งธ track ๋ถ„ํฌ์—์„œ์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ
ย 
Mahalanobis ๊ฑฐ๋ฆฌ :
  • ํ†ต๊ณ„ํ•™์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฑฐ๋ฆฌ : ํ‰๊ท ๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ํ‘œ์ค€ ํŽธ์ฐจ์˜ ๋ช‡ ๋ฐฐ์ธ์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ž…๋‹ˆ๋‹ค.
  • ์ข€ ๋” ์ง๊ด€์ ์œผ๋กœ ์„ค๋ช…ํ•˜์ž๋ฉด โ€œ์–ด๋–ค ๊ฒฝํ–ฅ์ด ์žˆ์„ ๋•Œ, ์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ผ์–ด๋‚˜๊ธฐ ํž˜๋“  ๊ฐ’์ธ์ง€?โ€ ๋ฅผ ๋‚˜ํƒ€๋‚ด์ฃผ๋Š” ์ฒ™๋„!
  • ๋” ์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด ํ™•๋ฅ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๊ฐ€๊นŒ์šด์ง€!
    • ex.1 (0,0) ์œผ๋กœ๋ถ€ํ„ฐ ๋„ค ์  (1,-1), (1,1), (1,-1), (-1,-1) ์˜ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋Š” ๋ชจ๋‘ ๊ฐ™์ง€๋งŒ, Mahalanobis ๊ฑฐ๋ฆฌ๋Š” (1,-1)๊ณผ (-1,1)์ด ๋งค์šฐ ํฐ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ex2.
      • ์ค‘์‹ฌ ฮผ1,ฮผ2 ์œผ๋กœ๋ถ€ํ„ฐ x๊นŒ์ง€์˜ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ(๊ธฐํ•˜ํ•™์  ๊ฑฐ๋ฆฌ) : L1 > L2
        ์ค‘์‹ฌ ฮผ1,ฮผ2 ์œผ๋กœ๋ถ€ํ„ฐ x๊นŒ์ง€์˜ ๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ(ํ†ต๊ณ„ํ•™์  ๊ฑฐ๋ฆฌ) : L1 < L2
        notion image
  • ์™œ Mahalanobis Distance๋ฅผ ์‚ฌ์šฉํ• ๊นŒ์š”?
    • ์–ด๋–ค ๋ฐ์ดํ„ฐ๊ฐ€ ์ง„์งœ ๋ฐ์ดํ„ฐ์ธ์ง€, ๊ฐ€์งœ ๋ฐ์ดํ„ฐ(Noise, False Alarm ๋“ฑ)์ธ์ง€ ๊ตฌ๋ณ„ํ•˜๋Š” ์šฉ๋„๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
      • notion image
ย 
๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ๋Š” state estimation ๋ถˆํ™•์‹ค์„ฑ์„ ์ธก์ •ํ•˜๊ณ  ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•ด ํ‘œ์ค€ํŽธ์ฐจ(detection์ด ํ‰๊ท  track์˜ ์œ„์น˜๋กœ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€)๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ์ด ์ธก์ •์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์—ญ ๋ถ„ํฌ์—์„œ ๊ณ„์‚ฐ๋œ 95%์˜ ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ์ด์šฉํ•˜์—ฌ ๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ๋ฅผ thresholdingํ•˜์—ฌ, ๊ฐ€๋ง์—†๋Š” ์—ฐ๊ด€์„ ๋ฐฐ์ œ์‹œํ‚ฌ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ decision์„ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ๋Š” ์›€์ง์ž„์— ๋Œ€ํ•œ ๋ถˆํ™•์‹ค์„ฑ์ด ๋‚ฎ์„ ๋•Œ ์ ์ ˆํ•œ ์—ฐ๊ด€์„ฑ ์ธก์ •์ง€ํ‘œ๊ฐ€ ๋  ์ˆ˜ ์žˆ์ง€๋งŒ, ์šฐ๋ฆฌ์˜ image-space problem formulation์—์„œ๋Š” Kalman filtering์—์„œ ๊ตฌํ•ด์ง„ ์˜ˆ์ธก ์ƒํƒœ๋ถ„ํฌ๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด ์œ„์น˜์— ๋Œ€ํ•œ ๋Œ€๋žต์ ์ธ ์ถ”์ •์น˜๋งŒ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์„ค๋ช…๋˜์ง€ ์•Š์€ ์นด๋ฉ”๋ผ์˜ ์›€์ง์ž„์€ ์˜์ƒ ํ‰๋ฉด์— ๊ธ‰๊ฒฉํ•œ ๋ณ€ํ™”๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ๋Š” occlusion์„ ์ถ”์ ํ•˜๋Š”๋ฐ ๋‹ค์†Œ ๋น„ํ˜•์‹์ ์ธ ์ธก์ •์ง€ํ‘œ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ, ๋‘๋ฒˆ์งธ metric์„ assignment problem์— ํ• ๋‹น์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
ย 
๊ฐ๊ฐ์˜ bounding box detection์ธ ์— ๋Œ€ํ•˜์—ฌ ์šฐ๋ฆฌ๋Š” ์ธ appearance descriptor, ๋ฅผ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ฐ ํŠธ๋ž™ k์— ๋Œ€ํ•˜์—ฌ ๋งˆ์ง€๋ง‰ 100๊ฐœ์˜ appearance descriptor๋ฅผ ๊ฐ€์ง€๋Š” ๊ฐค๋Ÿฌ๋ฆฌ์ธ ๋ฅผ ๋ณด๊ด€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋ฉด, ์šฐ๋ฆฌ์˜ ๋‘๋ฒˆ์งธ metric์€ appearance space์—์„œ i๋ฒˆ์งธ track๊ณผ j๋ฒˆ์งธ detection ์‚ฌ์ด์˜ ๊ฐ€์žฅ ์ž‘์€ ์ฝ”์‚ฌ์ธ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
์ฝ”์‚ฌ์ธ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•ด์„œ๋„ ์—ฐ๊ด€์„ฑ์ด ํ—ˆ์šฉ๋˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๋Š” ์ด์ง„ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ๋ถ„๋ฆฌ๋œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ ์ด indicator๋ฅผ ์œ„ํ•œ ์ ์ ˆํ•œ threshold ๊ฐ’์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ CNN์„ ์ด์šฉํ•˜์—ฌ bounding box์˜ appearance descriptor๋ฅผ ๊ณ„์‚ฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋Š” 2.4.์ ˆ์— ์„ค๋ช…๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋‘ ์ง€ํ‘œ๋ฅผ ๊ฒฐํ•ฉํ•˜๋ฉด, ๋‘ metric์€ ์„œ๋กœ ๋‹ค๋ฅธ assignment ๋ฌธ์ œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ ์ƒํ˜ธ๋ณด์™„์„ ์ด๋ฃน๋‹ˆ๋‹ค. ๋จผ์ €, ๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ๋Š” ๋‹จ๊ธฐ ์˜ˆ์ธก์— ํŠนํžˆ ์œ ์šฉํ•œ ์›€์ง์ž„์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋ฌผ์ฒด์˜ ๊ฐ€๋Šฅํ•œ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— ์ฝ”์‚ฌ์ธ ๊ฑฐ๋ฆฌ๋Š” ๋™์ž‘์ด ๋ณ„๋ฐ˜ ๋‹ค๋ฅด์ง€ ์•Š์„ ๋•Œ ์žฅ๊ธฐ๊ฐ„์˜ occlusion ์ดํ›„ ๋ฌผ์ฒด์— ๋Œ€ํ•œ ๋™์ผ์„ฑ์„ ํšŒ๋ณตํ•˜๋Š”๋ฐ ํŠนํžˆ ์œ ์šฉํ•œ appearance ์ •๋ณด๋ฅผ ๊ณ ๋ คํ•ฉ๋‹ˆ๋‹ค. Association ๋ฌธ์ œ๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” weighted sum์„ ์ด์šฉํ•˜์—ฌ ๋‘ metric์„ ๋ฌถ์–ด์ค๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์—ฌ๊ธฐ์„œ ๋‘ metric ๋ชจ๋‘์˜ gating ์˜์—ญ ์•ˆ์— ์žˆ๋‹ค๋ฉด ์—ฐ๊ด€์„ฑ์„ ์ธ์ •ํ•ฉ๋‹ˆ๋‹ค.
๊ฒฐํ•ฉ๋œ ์—ฐ๊ด€์„ฑ ๋น„์šฉ์— ๋Œ€ํ•˜์—ฌ ๊ฐ metric์˜ ์˜ํ–ฅ๋ ฅ์€ hyperparameter์ธ ๋ฅผ ํ†ตํ•ด ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๋ฉด์„œ ์นด๋ฉ”๋ผ์˜ ์›€์ง์ž„์ด ์ƒ๋‹นํ•  ๊ฒฝ์šฐ ๋กœ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ํ•ฉ๋ฆฌ์ ์ด๋ผ๋Š” ๊ฒฐ์ •์„ ๋‚ด๋ ธ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ๋Š” ์—ฐ๊ด€์„ฑ ๋น„์šฉ ์ˆ˜์‹์—์„œ ์˜ค์ง appearance ์ •๋ณด๋งŒ์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ๋งˆํ• ๋ผ๋…ธ๋น„์Šค๋Š” ์—ฌ์ „ํžˆ Kalman filter์— ์˜ํ•ด ์ถ”๋ก ๋œ ๊ฐ€๋Šฅํ•œ ๋ฌผ์ฒด ์œ„์น˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์‹คํ–‰๋ถˆ๊ฐ€๋Šฅํ•œ assignment๋ฅผ ๋ฌด์‹œํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
ย 

2.3. Matching Cascade

Global assignment ๋ฌธ์ œ์—์„œ measurement์™€ tracking ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ํ•ด๊ฒฐํ•˜๊ธฐ๋ณด๋‹ค๋Š”, ์ผ๋ จ์˜ ํ•˜์œ„ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๋Š” cascade๋ฅผ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ƒํ™ฉ์„ ๊ณ ๋ คํ•ด๋ด…์‹œ๋‹ค.
๊ฐ์ฒด๊ฐ€ ๋”์šฑ ๊ธด ์‹œ๊ฐ„๋™์•ˆ ๊ฐ€๋ ค์ง€๊ฒŒ ๋˜๋ฉด, ๋’ค์ด์€ Kalman filter๋Š” ๋ฌผ์ฒด์˜ ์œ„์น˜์— ๋Œ€ํ•œ ๋ถˆํ™•์‹ค์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ํ™•๋ฅ ์งˆ๋Ÿ‰์€ state space์—์„œ ํผ์ง€๋Š” ๋ชจ์–‘์„ ๊ฐ€์ง€๊ฒŒ ๋˜๊ณ  ๊ด€์ธก ๊ฐ€๋Šฅ์„ฑ์€ ๋œ ๋พฐ์กฑํ•ด์ง‘๋‹ˆ๋‹ค. ์ง๊ด€์ ์œผ๋กœ, association metric์€ measurement-to-track distance๋ฅผ ์ฆ๊ฐ€์‹œํ‚ด์œผ๋กœ์จ ํผ์ง„ ํ™•๋ฅ  ์งˆ๋Ÿ‰์„ ์„ค๋ช…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋ฐ˜๋Œ€๋กœ ๋‘๊ฐœ์˜ track์ด ๋™์ผํ•œ detection์— ๋Œ€ํ•ด ๊ฒฝ์Ÿ์„ ํ•  ๋•Œ์—๋Š”, ๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ๋Š” ๋” ํฐ ๋ถˆํ™•์‹ค์„ฑ์„ ์„ ํ˜ธํ•˜๋Š”๋ฐ, ์ด๋Š” ์–ด๋– ํ•œ detection์ด๋“  ํ•ด๋‹น track์˜ ํ‰๊ท ์— ๋Œ€ํ•œ ํ‘œ์ค€ํŽธ์ฐจ ๊ฑฐ๋ฆฌ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐ์†Œ์‹œ์ผœ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, ์ด๋Š” ํŠธ๋ž™์˜ ๋ถ„์—ด๊ณผ ๋ถˆ์•ˆ์ •ํ•œ ํŠธ๋ž™์˜ ์ฆ๊ฐ€๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ํ–‰๋™์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์—ฐ๊ด€ ๊ฐ€๋Šฅ์„ฑ ํ™•์‚ฐ์˜ ๊ฐœ๋…์„ ์ธ์ฝ”๋”ฉํ•˜๊ธฐ ์œ„ํ•ด ๋” ์ž์ฃผ ๋ณด์ด๋Š” ๋ฌผ์ฒด์—๊ฒŒ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋ถ€์—ฌํ•˜๋Š” matching cascade๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.
ย 
notion image
input์œผ๋กœ track set T, detection index๋“ค์„ ๋œปํ•˜๋Š” D, ์ตœ๋Œ€ ๋‚˜์ด์ธ ๊ฐ€ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
1,2 : ๊ด€๋ จ ๋น„์šฉ matrix์™€ gate(ํ—ˆ์šฉ๋œ ์—ฐ๊ด€์„ฑ) matrix๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ํ›„ track์˜ ๋‚˜์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ๋ฐœ์ƒํ•˜๋Š” ์„ ํ˜• ํ• ๋‹น ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด track์˜ ๋‚˜์ด n์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.
6 : ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰ n๋ฒˆ์งธ ํ”„๋ ˆ์ž„์—์„œ์˜ detection๊ณผ ์—ฐ๊ด€๋˜์ง€ ์•Š์€ track ์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
7 : track ๊ณผ ๋งค์นญ๋˜์ง€ ์•Š์€ detection U ์‚ฌ์ด์˜ ์„ ํ˜• assignment๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
8, 9 : matching set๊ณผ matching ๋˜์ง€ ์•Š์€ detection์„ ์—…๋ฐ์ดํŠธํ•˜๋Š”๋ฐ, ์™„๋ฃŒ ํ›„ 11๋ฒˆ์งธ ๋ผ์ธ์—์„œ ๋ฐ˜ํ™˜ํ•ด์ค„ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
์œ„์™€ ๊ฐ™์€ matching cascade ๊ธฐ๋ฒ•์€ ๋” ์ ์€ ๋‚˜์ด, ์ฆ‰ ์ตœ๊ทผ์— ๋” ๋งŽ์ด ๋ฐœ๊ฒฌ๋œ track๋“ค์—๊ฒŒ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์œ ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰ matching ๋‹จ๊ณ„์—์„œ๋Š” ํ™•์ธ๋˜์ง€๋„, ๋งค์นญ๋˜์ง€๋„ ์•Š์€ age n=1์ธ track ์ง‘ํ•ฉ์— ๋Œ€ํ•˜์—ฌ ์›๋ž˜ SORT ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์ œ์‹œ๋œ ๊ฒƒ๊ณผ ๊ฐ™์ด union association์„ ํ†ตํ•ด ๊ต์ฐจ์ ์„ ์šด์˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ‘์ž‘์Šค๋Ÿฌ์šด ์™ธํ˜• ๋ณ€ํ™”(์˜ˆ๋ฅผ ๋“ค๋ฉด ์ •์ ์ธ ์žฅ๋ฉด์˜ ๋ถ€๋ถ„์ ์ธ occlusion)๋ฅผ ์„ค๋ช…ํ•˜๊ฑฐ๋‚˜, ์ž˜๋ชป๋œ ์ดˆ๊ธฐํ™”์— ๋Œ€ํ•ด์„œ ๊ฒฌ๊ณ ํ•จ์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š”๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
ย 

2.4. Deep Appearance Descriptor

์ถ”๊ฐ€์ ์ธ metric ํ•™์Šต ์—†์ด ๊ฐ„๋‹จํ•œ Nearest Neighbor ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ, ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ๋ฐฉ์‹์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์‹ค์ œ online tracking application ์ „์— offline ์ƒ์—์„œ ์ž˜ ์ •์˜๋œ feature embedding์ด ์ž˜ ํ›ˆ๋ จ๋˜์–ด์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด 1,261๋ช…์˜ ๋ณดํ–‰์ž์˜ 1,100,000์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•˜๋Š” ๋Œ€์šฉ๋Ÿ‰ ์‚ฌ๋žŒ ์žฌ์ธ์‹ ๋ฐ์ดํ„ฐ์…‹(MARS)์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ CNN์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ณ , ์ด๋Š” people tracking context์—์„œ deep metric์„ ํ•™์Šตํ•˜๋Š”๋ฐ ์ ์ ˆํ•ฉ๋‹ˆ๋‹ค.
ย 
๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” CNN ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
notion image
์š”์•ฝํ•˜์ž๋ฉด, 2๊ฐœ์˜ convolutional layer์™€ 6๊ฐœ์˜ residual block์„ ๊ฐ€์ง„ ๋„“์€ residual network๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
128์ฐจ์›์˜ global feature map์˜ ๊ฒฝ์šฐ 10๋ฒˆ์งธ dense layer์—์„œ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. final batch์™€ ์ •๊ทœํ™”๋Š” ํŠน์ง•๋“ค์„ unit hypersphere์— ์˜์‚ฌ์‹œ์ผœ cosine appearance metric๊ณผ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. 2,800,864๊ฐœ์˜ parameter๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ  Nvidia GeForce GTX 1050 mobile GPU๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ, 32๊ฐœ์˜ bounding box์— ๋Œ€ํ•˜์—ฌ 1๋ฒˆ์˜ forward pass๋ฅผ ์ ์šฉํ•  ๊ฒฝ์šฐ 30ms๋ฅผ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด ๋„คํŠธ์›Œํฌ๋Š” ํ˜„๋Œ€์˜ GPU๊ฐ€ ์ฃผ์–ด์ง„๋‹ค๋ฉด online tracking์— ์ ์ ˆํ•ฉ๋‹ˆ๋‹ค.
ํ•™์Šต์ ˆ์ฐจ์— ๋Œ€ํ•œ ๋””ํ…Œ์ผ์€ ํ•ด๋‹น ๋…ผ๋ฌธ์˜ ๋ฒ”์œ„์—์„œ ๋ฒ—์–ด๋‚ฌ์ง€๋งŒ, GitHub์—์„œ ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
ย 

3. Experiments

MOT16 benchmark์— ๋Œ€ํ•˜์—ฌ ์šฐ๋ฆฌ์˜ tracker์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด benchmark๋Š” 7๊ฐœ์˜ ๋„์ „์ ์ธ test sequences์— ๋Œ€ํ•ด ์ถ”์  ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š”๋ฐ, ์›€์ง์ด๋Š” ์นด๋ฉ”๋ผ์˜ ์ •๋ฉด ์ดฌ์˜ ์žฅ๋ฉด๊ณผ ์œ„์—์„œ ์•„๋ž˜๋กœ ๊ฐ์‹œํ•˜๋Š” ์„ค์ • ๋˜ํ•œ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋“ค์€ ํ›Œ๋ฅญํ•œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด public, private ๋ฐ์ดํ„ฐ์…‹์„ ๋ชจ์•„ Faster RCNN์„ ํ›ˆ๋ จ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๋™๋“ฑํ•œ ๋น„๊ต๋ฅผ ์œ„ํ•ด, ๋˜‘๊ฐ™์€ detection์— ๋Œ€ํ•ด SORT๋ฅผ ์žฌํ•™์Šต์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
test sequence์— ๋Œ€ํ•œ ํ‰๊ฐ€์—์„œ ํ”„๋ ˆ์ž„์œผ๋กœ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. detection์˜ ๊ฒฝ์šฐ 0.3์ด๋ผ๋Š” confidence score์— ๋Œ€ํ•œ threshold๋ฅผ ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์—์„œ ๋‚จ์€ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฒฝ์šฐ benchmark์—์„œ ์ œ๊ณต๋˜๋Š” ๋ถ„๋ฆฌ๋œ ํ›ˆ๋ จ sequence์—์„œ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ‰๊ฐ€ metric์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
ย 
MOTA(Multi-object tracking accuracy) : false positive, false negative, identity switch ์ธก๋ฉด์—์„œ์˜ ์ „๋ฐ˜์ ์ธ tracking accuracy
MOTP(Multi-object tracking precision) : ground-truth์™€ ๊ธฐ๋ก๋œ ์œ„์น˜ ์‚ฌ์ด์˜ bounding box overlap ์ธก๋ฉด์—์„œ์˜ ์ „๋ฐ˜์ ์ธ tracking accuracy
MT(Mostly Tracked) : ์ƒ๋ช…์ฃผ๊ธฐ์˜ ์ตœ์†Œ 80%๋Š” ๋™์ผํ•œ ๋ผ๋ฒจ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ground-truth track์˜ ๋น„์œจ
ML(Mostly Lost) : ์ƒ๋ช…์ฃผ๊ธฐ์˜ ์ตœ๋Œ€ 20%๋งŒ ์ถ”์ ๋œ ground-truth track์˜ ๋น„์œจ
ID(Identity Switches) : ground-truth track์˜ ๊ธฐ๋ก๋œ identity switch ํšŸ์ˆ˜
FM(Fragmentation) : missing detection์— ์˜ํ•ด track์ด ๋ฐฉํ•ด๋ฅผ ๋ฐ›์€ ํšŸ์ˆ˜
notion image
์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์ธ Deep SORT๋Š” ์„ฑ๊ณต์ ์œผ๋กœ identity switch ์ˆ˜๋ฅผ ๊ฐ์†Œ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. SORT์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, id switch์˜ ๊ฒฝ์šฐ 1423์—์„œ 781๋กœ ์•ฝ 45%๊ฐ€ ๊ฐ์†Œํ•˜์˜€์Šต๋‹ˆ๋‹ค. occlusion์ด๋‚˜ miss๋กœ๋ถ€ํ„ฐ ๊ฐ์ฒด์˜ identity๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด track ๋ถ„์—ด์˜ ๊ฒฝ์šฐ ๋ฏธ์„ธํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. MT๋Š” ์ƒ๋‹นํ•œ ์ฆ๊ฐ€๋ฅผ, ML์€ ๊ฐ์†Œ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ, appearance ์ •๋ณด์˜ ๊ฒฐํ•ฉ์œผ๋กœ ๋ฌผ์ฒด์˜ ๊ฐ€๋ ค์ง์—์„œ๋„ ์„ฑ๊ณต์ ์œผ๋กœ identity๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
notion image
์šฐ๋ฆฌ์˜ ๋ฐฉ์‹์€ ๋˜ํ•œ ๋‹ค๋ฅธ online tracking ํ”„๋ ˆ์ž„์›Œํฌ์— ๋Œ€ํ•˜์—ฌ ๊ฐ•๋ ฅํ•œ ๊ฒฝ์Ÿ์ž์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์šฐ๋ฆฌ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋ชจ๋“  online ๋ฐฉ์‹ ์ค‘์—์„œ ๊ฒฝ์Ÿ๋ ฅ์žˆ๋Š” MOTA score, track fragmentation, false negative๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ๋„, ๊ฐ€์žฅ ์ ์€ identity switch๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ก๋œ tracking accuracy๋Š” ๋งŽ์€ ์–‘์˜ false positive๋กœ ์ธํ•ด ๋Œ€๋ถ€๋ถ„ ์†์ƒ๋ฉ๋‹ˆ๋‹ค. MOTA score์— ๋Œ€ํ•œ ์ „๋ฐ˜์ ์ธ ์˜ํ–ฅ๋ ฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, detection์— ํฐ confidence threshold๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ์ž ์žฌ์ ์œผ๋กœ ํฐ ๋งˆ์ง„์— ์˜ํ•ด ์šฐ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ธฐ๋ก๋œ ์„ฑ๋Šฅ์„ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, tracking output์„ ์‹œ๊ฐ์ ์œผ๋กœ ์ ๊ฒ€ํ•˜๋Š” ๊ฒƒ์€ ์ด๋Ÿฌํ•œ false positives๋“ค์ด ๋Œ€๋ถ€๋ถ„ ์ •์ ์ธ ์žฅ๋ฉด์—์„œ ๊ณ ๋ฆฝ๋œ detector ์‘๋‹ต๋“ค๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ƒ๋Œ€์ ์œผ๋กœ ํฐ track์˜ ์ตœ๋Œ€ ํ—ˆ์šฉ ๋‚˜์ด ๋•Œ๋ฌธ์— ์ข…์ข… ๋ฌผ์ฒด์— ๋Œ€ํ•œ trajectory์— ๋” ํ”ํ•˜๊ฒŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋™์‹œ์—, ์šฐ๋ฆฌ๋Š” ์ž˜๋ชป๋œ ์•Œ๋ฆผ์œผ๋กœ๋ถ€ํ„ฐ ์ž์ฃผ track jumping์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ ์—, ๊ธฐ๋ก๋œ ๊ฐ์ฒด์˜ ์œ„์น˜์—์„œ ์ƒ๋Œ€์ ์œผ๋กœ ์•ˆ์ •์ ์ด๊ณ  ๋ณ€ํ™”๊ฐ€ ์—†๋Š” track๋“ค์„ ์ฃผ๋กœ ์ƒ์„ฑํ•ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹คํ—˜์€ feature๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„์˜ ๊ฑฐ์˜ ์ ˆ๋ฐ˜์ธ 20Hz๊ฐ€ ์†Œ์š”๋์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ํ˜„๋Œ€์˜ GPU๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์‹œ์Šคํ…œ์˜ ๊ณ„์‚ฐ์ด ํšจ์œจ์ ์ด๊ณ  ์‹ค์‹œ๊ฐ„์—์„œ ์ž‘๋™ํ• ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. Conclusion

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ์ „ ํ›ˆ๋ จ๋œ association metric์„ ํ†ตํ•ด ์™ธํ˜• ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•œ SORT์— ๋Œ€ํ•œ ํ™•์žฅ์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, ๋” ์˜ค๋žซ๋™์•ˆ ๋ฐฉํ•ด๋ฌผ์— ๊ฐ€๋ ค์ ธ ์žˆ์–ด๋„ ์ถ”์ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋ฉด์„œ, SORT๋ฅผ ์˜จ๋ผ์ธ ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ SOTA(State Of The Art : ์‚ฌ์ „ ํ•™์Šต๋œ ํ˜„์žฌ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์‹ ๊ฒฝ๋ง)์ˆ˜์ค€์˜ ๊ฐ•๋ ฅํ•œ ๊ฒฝ์Ÿ์ž๋กœ ์ž๋ฆฌ๋งค๊น€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์‹คํ–‰์ด ๊ฐ„ํŽธํ•˜๊ณ , ์‹ค์‹œ๊ฐ„ ๊ตฌํ˜„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
ย 

Reference

SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC https://arxiv.org/pdf/1703.07402.pdf
ย 

์ ์šฉ ๋ฐ ์ฝ”๋“œ

์ฝ”๋“œ
# clone repository for deepsort with yolov4 !git clone https://github.com/theAIGuysCode/yolov4-deepsort # step into the yolov4-deepsort folder %cd yolov4-deepsort/ # download yolov4 model weights to data folder !wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights -P data/ ! pip uninstall tensorflow ! pip install tensorflow==2.3.0 # Convert darknet weights to tensorflow model !python save_model.py --model yolov4 # run DeepSort with YOLOv4 Object Detections as backbone (enable --info flag to see info about tracked objects) !python object_tracker.py --video ./data/video/Seoul.mp4 --output ./outputs/tracker.avi --model yolov4 --dont_show --info # define helper function to display videos import io from IPython.display import HTML from base64 import b64encode def show_video(file_name, width=640): # show resulting deepsort video mp4 = open(file_name,'rb').read() data_url = "data:video/mp4;base64," + b64encode(mp4).decode() return HTML(""" <video width="{0}" controls> <source src="{1}" type="video/mp4"> </video> """.format(width, data_url)) # convert resulting video from avi to mp4 file format import os path_video = os.path.join("outputs","tracker.avi") %cd outputs/ !ffmpeg -y -loglevel panic -i tracker.avi output.mp4 %cd .. # output object tracking video path_output = os.path.join("outputs","output.mp4") show_video(path_output, width=960)
์ถœ์ฒ˜ : https://www.youtube.com/watch?v=_zrNUzDS8Zc&t=309s
ย 
ย