Loss Function

Loss Function

Loss Function


linear classification์—์„œ Weight๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ Weight์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ •๋‹ต ๊ฐ’๊ณผ ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š”์ง€ ๋ณด์—ฌ์คŒ์œผ๋กœ์จ ์–ผ๋งˆ๋‚˜ ์˜ˆ์ธก์„ ํ•˜๋Š”์ง€ ์ •๋Ÿ‰ํ™”ํ•˜์—ฌ ๋ณด์—ฌ์ฃผ๋Š” ํ•จ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
๐Ÿฅ‘
Loss Function์— ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ ๋ผ๋ฒจ๊ฐ’์„ ๋„ฃ์–ด ํ‰๊ท ์„ ๊ตฌํ•ด ์ „์ฒด loss ๊ฐ’์„ ๊ตฌํ•˜๋Š” ์›๋ฆฌ : ์‚ฌ์šฉํ•  Loss Function : input ์ด๋ฏธ์ง€์˜ score : ์ •๋‹ต score : ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋‚˜์˜จ ํด๋ž˜์Šค์˜ ์ ์ˆ˜ : ํด๋ž˜์Šค์˜ ์ˆ˜
ย 
ย 

Multiclass SVM Loss

SVM(์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹ )์€ย ๊ธฐ๊ณ„ ํ•™์Šต์˜ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜๋กœ ํŒจํ„ด ์ธ์‹, ์ž๋ฃŒ ๋ถ„์„์„ ์œ„ํ•œย ์ง€๋„ ํ•™์Šตย ๋ชจ๋ธ์ด๋‹ค. ์ด๋Ÿฌํ•œ SVM์€ ๋‹คํ•ญ ๋ถ„๋ฅ˜์—์„œ๋„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ด๋ฅผ multiclsss SVM์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ์‚ฌ์šฉ๋˜๋Š” loss funtion์„ hinge loss๋ผ๊ณ  ํ•œ๋‹ค.
์ž์„ธํ•œ ๋‚ด์šฉ์€ Multiclass SVM Loss ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ์˜ค.
ย 
ย 

Softmax (Multinomial logistic regression)

Multiclass SVM์—์„œ score๋Š” ์˜๋ฏธ๋ฅผ ๊ฐ–์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ •๋‹ต ํด๋ž˜์Šค์—์„œ ๋” ๋†’์€ score๋ฅผ ์–ป๊ธฐ๋งŒ ํ•˜๋ฉด ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ Softmax์˜ ๊ฒฝ์šฐ์—๋Š” ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•ด socre ์ž์ฒด์— ์˜๋ฏธ๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค. ์ด๋Ÿฌํ•œ Softmax๋Š” ๋”ฅ๋Ÿฌ๋‹ ์‹ ๊ฒฝ๋ง ์ถœ๋ ฅ์ธต์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ํ™œ์„ฑํ•จ์ˆ˜์ด๋‹ค.
์ž์„ธํ•œ ๋‚ด์šฉ์€ Softmax ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ์˜ค.
ย 

Regularization

์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•œ Weight๋Š” Train data์— ๋งž์ถฐ์ ธ ์žˆ๊ธฐ์—, Test data์—์„œ๋Š” Train data๋ฅผ ํ†ตํ•ด์„œ ๊ตฌํ•œ Weight๊ณผ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค.
notion image
notion image
Train data์—์„œ ์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•œ ๊ฒƒ์„ ์™ผ์ชฝ์˜ ํŒŒ๋ž€์ƒ‰ ๊ทธ๋ž˜ํ”„๋ผ๊ณ  ํ•˜๊ณ  ์‹ค์ œ Test๋ฅผ ํ–ˆ์„ ๋•Œ ์˜ค๋ฅธ์ชฝ ๊ทธ๋ž˜ํ”„์—์„œ ๋ณด์ด๋Š” ์ดˆ๋ก์ƒ‰ ๋„ค๋ชจ์™€ ๊ฐ™์ด ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ ์ƒํ™ฉ์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ๋Š” Train์— ๋งž์ถฐ์ ธ Test๊ฐ’์˜ ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์•„์ง€๋Š”๊ฒƒ์„ Overfitting(๊ณผ์ ํ•ฉ)์ด๋ผ๊ณ  ํ•œ๋‹ค. Train data์— Overfitting๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ง‰๊ธฐ ์œ„ํ•ด Regularization์„ ์‚ฌ์šฉํ•œ๋‹ค. Regularization ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•˜๋ฉฐ, ํŠน์ • ๊ฐ€์ค‘์น˜๊ฐ€ ๋„ˆ๋ฌด ๊ณผ๋„ํ•˜๊ฒŒ ์ปค์ง€์ง€ ์•Š๋„๋ก ํ•™์Šต์‹œ์ผœ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๋Š”๋ฐ ๋„์›€์„ ์ค€๋‹ค.
ย 
ย 
์ข…๋ฅ˜
  1. L1 Regularization
  1. L2 Regularization
  1. Max Norm Regularization
  1. Dropout
  1. Batch Normalization
ย 
๐Ÿฅ‘
๊ฐ€์žฅ ๋ณดํŽธ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” Regularization์œผ๋กœ Weight decay๋ผ๊ณ ๋„ ํ•œ๋‹ค. ๋ถ„๋ฅ˜๊ธฐ์˜ ๋ณต์žก๋„๋ฅผ ์ƒ๋Œ€์ ์œผ๋กœ w1, w2 ์ค‘ ์–ด๋–ค ๊ฐ’์ด ๋” ๋งค๋„๋Ÿฌ์šด์ง€ ์ธก์ •ํ•œ๋‹ค. ํŠน์ • ์š”์†Œ์— ์˜์กดํ•˜๊ธฐ ๋ณด๋‹ค ๋ชจ๋“  ์š”์†Œ๊ฐ€ ๊ณจ๊ณ ๋ฃจ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธธ ์›ํ•  ๋•Œ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€์ค‘์น˜๊ฐ€ ์ „๋ฐ˜์ ์œผ๋กœ ์ž‘๊ณ  ๊ณ ๋ฅด๊ฒŒ ๋ถ„์‚ฐ๋œ ํ˜•ํƒœ๋กœ ์ง„ํ–‰๋˜์–ด Overfitting์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
ย 
์ž์„ธํ•œ ๋‚ด์šฉ์€ Regularization (์ •๊ทœํ™”) ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ์˜ค.
ย