Babysitting the Learning Process

Babysitting the Learning Process


1. Preprocess the data

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์œผ๋กœ ์ด๋ฏธ์ง€ ์ธ์‹ ๋ฌธ์ œ์—์„œ๋Š” zero-centered๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค.
ย 

2. Choose the architecture

Hidden layer๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•  ๊ฒƒ์ธ์ง€ ๋“ฑ์„ ์„ ํƒํ•˜์—ฌ ํฐ ํ‹€์„ ์žก์•„๋†“๊ณ  ์ง„ํ–‰ํ•œ๋‹ค.
ย 

3. Check that the loss is reasonable

3๊ฐ•์—์„œ ๋ฐฐ์› ๋˜ sanity check ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•œ๋‹ค. regularization ์ถ”๊ฐ€ ํ›„ loss ์˜ ๋ณ€ํ™”๋ฅผ ํ™•์ธํ•œ๋‹ค.
ย 

4. Train 1

์ž‘์€ ๋ฐ์ดํ„ฐ ์…‹์„ ๋จผ์ € ๋„ฃ์–ด train์„ ์ง„ํ–‰ํ•œ๋‹ค. regularization์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , epoch๋งˆ๋‹ค loss๋Š” ๊ฐ์†Œํ•˜๋Š”์ง€ train accuracy๋Š” ์ฆ๊ฐ€ ํ™•์ธํ•œ๋‹ค.
๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— overfitting ๋ฐœ์ƒํ•ด์•ผ ํ•˜๊ณ  train accuracy๊ฐ€ 100%๊ฐ€ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜๋ฉด๋ชจ๋ธ์ด ์ œ๋Œ€๋กœ ์ž‘๋™ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.
ย 

5. Train 2

regulaization์„ ์กฐ๊ธˆ์”ฉ ์ฃผ๋ฉฐ learning rate ์ฐพ๋Š” ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. learning rate๊ฐ€ ์ž‘์œผ๋ฉด gradient ์—…๋ฐ์ดํŠธ๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ผ์–ด๋‚˜์ง€ ์•Š์•„ loss๊ฐ€ ์ค„์–ด๋“ค์ง€ ์•Š๊ณ , ๋„ˆ๋ฌด ํฌ๋ฉด NaNs ๋กœ ๋ฐœ์‚ฐํ•˜๊ฒŒ ๋œ๋‹ค.