๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

4๊ฐ• - ํ˜‘์—… ํ•„ํ„ฐ๋ง (KNN, SGD, ALS)

ย 
ย 
ย 

์œ ์‚ฌํ•œ ์‚ฌ์šฉ์ž/์•„์ดํ…œ์„ ์ฐพ๋Š” ํ˜‘์—… ํ•„ํ„ฐ๋ง

ย 

1) ํ˜‘์—… ํ•„ํ„ฐ๋ง์ด๋ž€? (feat. ์ •์˜, ์ข…๋ฅ˜)

ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ํ•œ ์‚ฌ์šฉ์ž์˜ ๊ตฌ๋งค ํŒจํ„ด ๋ฐ ํ‰์ ์„ ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž๋“ค์˜ ๊ตฌ๋งค ํŒจํ„ด ๋ฐ ํ‰์ ๊ณผ ๋น„๊ตํ•˜์—ฌ ์•„์ดํ…œ์„ ์ถ”์ฒœํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ฆ‰, ํ•ด๋‹น ์‚ฌ์šฉ์ž์™€ ๋น„์Šทํ•˜๊ฒŒ ํ‰์ ์„ ๋งค๊ธด ์‚ฌ์šฉ์ž๋“ค์„ ์ฐพ์•„ ์ด๋“ค์˜ ๊ตฌ๋งค ํŒจํ„ด ๋ฐ ํ‰์  ์ •๋ณด๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉ์ž์—๊ฒŒ ์•„์ดํ…œ์„ ์ถ”์ฒœํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํ˜‘์—…ํ•„ํ„ฐ๋ง์—์„œ๋Š” ์ถ”๊ฐ€์ ์ธ ์‚ฌ์šฉ์ž์˜ ๊ฐœ์ธ์ •๋ณด๋‚˜ ์•„์ดํ…œ ์ •๋ณด๊ฐ€ ์—†์ด๋„ ์ถ”์ฒœ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.
ํ˜‘์—… ํ•„ํ„ฐ๋ง์—๋Š” ํฌ๊ฒŒ ์ตœ๊ทผ์ ‘ ์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค. ๋จผ์ € ์ตœ๊ทผ์ ‘ ์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ๋ธ”๋ก ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์ธ KNN์„ ํ™œ์šฉํ•˜๋ฉฐ, Netflix Prize Competition์—์„œ ์šฐ์Šนํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์œ ๋ช…ํ•ด์กŒ๋‹ค. ์ด์™€ ๋‹ฌ๋ฆฌ ์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ์ž ์žฌ ์š”์ธ์„ ํ™œ์šฉํ•œ๋‹ค.
ย 

2) ํ˜‘์—… ํ•„ํ„ฐ๋ง์˜ ๊ธฐ์ดˆ

(1) K Nearest Neighbors(KNN)
notion image
ย 
์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์—์„œ ํ™œ์šฉ๋˜๋Š” KNN ์•Œ๊ณ ๋ฆฌ์ฆ˜์€, ํ•œ ์‚ฌ์šฉ์ž(ํ˜น์€ ํ•œ ์ )์™€ ๊ฐ€์žฅ ๊ทผ์ ‘ํ•œ K๊ฐœ์˜ Neighbors๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ผ๋ก€๋กœ ์œ„์˜ ๊ทธ๋ฆผ์—์„œ Pt๋ผ๋Š” ์ ์ด ์ƒˆ๋กœ ๋“ค์–ด์™”๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ์ด ์ ์ด ์–ด๋–ค Class์— ์†ํ•˜๋Š”์ง€๋ฅผ ์•Œ๊ณ ์ž ํ•œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ด Pt๋Š” Class A์— ๋‘ ๊ฐœ, Class B์— ์„ธ ๊ฐœ, Class C์— ๋‘ ๊ฐœ์˜ ์ ์ด ํฌํ•จ๋˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์„ธ ๊ฐœ์˜ ์ ์ด ํฌํ•จ๋œ Class B์— ๊ฐ€์žฅ ๊ทผ์ ‘ํ•˜๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, B๋ฅผ Pt์˜ Class๋กœ ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.
ย 
(2) ๋ฐ์ดํ„ฐ(Explicit Feedback)
๋จผ์ € ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณธ ๋’ค, ํ˜‘์—… ํ•„ํ„ฐ๋ง ๊ฐ๊ฐ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ณ ์ž ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ์—๋Š” ํฌ๊ฒŒ Implicit Feedback๊ณผ Explicit Feedback ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ์•„์ดํ…œ์„ ๊ตฌ๋งคํ–ˆ๋Š”์ง€ ๊ตฌ๋งคํ•˜์ง€ ์•Š์•˜๋Š”์ง€์— ๋Œ€ํ•œ ์ •๋ณด๋Š” ์•Œ๊ณ  ์žˆ์œผ๋‚˜ ์ด ์•„์ดํ…œ์„ ์ข‹์•„ํ•˜๋Š”์ง€ ์‹ซ์–ดํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๋ฐ˜์‘ ์ •๋ณด๋ฅผ ๋ชจ๋ฅด๊ณ  ์žˆ๋Š” ์ƒํƒœ๋ฅผ Implicit์ด๋ผ๊ณ  ํ•˜๊ณ , ์‚ฌ์šฉ์ž๊ฐ€ ์•„์ดํ…œ์— ๋Œ€ํ•œ ์ž์‹ ์˜ ์„ ํ˜ธ๋„๋ฅผ ์ง์ ‘ ํ‘œํ˜„ํ•˜๋Š” ์ƒํƒœ๋ฅผ Explicit์ด๋ผ๊ณ  ํ•œ๋‹ค.
๊ทธ๋ž˜์„œ ์•ž์œผ๋กœ ํ˜‘์—… ํ•„ํ„ฐ๋ง ์˜ˆ์ œ์—์„œ๋Š” ์‚ฌ์šฉ์ž๋“ค์ด ๊ฐ๊ฐ์˜ ์•„์ดํ…œ์— ๋Œ€ํ•ด์„œ 1~10์ ์˜ ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๊ฒŒ ๋˜๋ฉฐ ๋ช‡๋ช‡ ์•„์ดํ…œ์— ๋Œ€ํ•ด์„œ๋Š” ์ ์ˆ˜๊ฐ€ ์—†๋Š” ํ˜•ํƒœ, ์ฆ‰ ์•„์ดํ…œ์„ ๊ตฌ๋งคํ•˜์ง€ ์•Š์•„ ?๊ฐ€ ๋‚˜์˜ค๋Š” ํ˜•ํƒœ๋ฅผ ๋„๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค. ์ด๋•Œ ์šฐ๋ฆฌ๋Š” ๊ตฌ๋งคํ•˜์ง€ ์•Š์•˜์œผ๋‚˜ ์ด์›ƒ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ†ตํ•ด ์–ด๋–ค ํ‰์ ์„ ๊ฐ€์งˆ์ง€ ์˜ˆ์ธกํ•˜์—ฌ ?์— ๋“ค์–ด๊ฐˆ ๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.
ย 

3) ์ตœ๊ทผ์ ‘ ์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(Neighborhood based Collaborative Filtering)

(1) ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง ๊ธฐ๋ณธ ์˜ˆ์ œ
๋ฉ”๋ชจ๋ฆฌ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ ์ตœ๊ทผ์ ‘ ์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ํ˜‘์—… ํ•„ํ„ฐ๋ง์„ ์œ„ํ•ด์„œ ๊ฐœ๋ฐœ๋œ ์ดˆ๊ธฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ์„œ ํฌ๊ฒŒ ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. ๋จผ์ € ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(User-based collaborative filtering)์€ ํ•ด๋‹น ์‚ฌ์šฉ์ž์™€ ๊ตฌ๋งค ํŒจํ„ด ๋ฐ ํ‰์ ์ด ๋น„์Šทํ•œ ์‚ฌ์šฉ์ž๋ฅผ ์ฐพ์•„ ์ด ๋น„์Šทํ•œ ์‚ฌ์šฉ์ž๊ฐ€ ๋ณธ ์•„์ดํ…œ์œผ๋กœ ์ถ”์ฒœ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ฆ‰, ์•„์ดํ…œ์„ ์ถ”์ฒœํ•ด์ค„ ๋•Œ โ€œ์œ ์‚ฌํ•œ ์‚ฌ๋žŒโ€์„ ์ฐพ๋Š”๋‹ค๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค. ์•„๋ž˜์˜ User-based filtering ์˜ˆ์ œ์—์„œ๋„ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ์„ธ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž๊ฐ€ ์ฒซ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž์™€ 50%(2/4) ์œ ์‚ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„ธ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž์™€ ์œ ์‚ฌํ•œ ์ฒซ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž๊ฐ€ ๋ณธ ํฌ๋„์™€ ์˜ค๋ Œ์ง€๋ผ๋Š” ์•„์ดํ…œ์„ ์„ธ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ถ”์ฒœํ•ด์ฃผ๋Š” ๊ฒƒ์ด๋‹ค.
์ด์™€ ๋‹ฌ๋ฆฌ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(Item-based collaborative filtering)์€ ํ•ด๋‹น ์‚ฌ์šฉ์ž๊ฐ€ ๋ณด๊ณ  ํ‰์ ์„ ์ค€ ์•„์ดํ…œ๊ณผ ์œ ์‚ฌํ•œ ์•„์ดํ…œ์„ ์ฐพ์•„์„œ ์ถ”์ฒœ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ฆ‰ ์ถ”์ฒœ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด โ€œ์œ ์‚ฌํ•œ ์•„์ดํ…œโ€์„ ์ฐพ๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด๋‹ค. Item-based filtering ์˜ˆ์ œ๋ฅผ ๋ณด๋ฉด, ์„ธ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž๋Š” ์ˆ˜๋ฐ•์ด๋ผ๋Š” ์•„์ดํ…œ์„ ๋ณด์•˜๊ณ  ์ด๋ฅผ ๋ณธ ์ฒซ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž์™€ ๋‘ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž๋Š” ๋ชจ๋‘ ํฌ๋„๋ผ๋Š” ์•„์ดํ…œ ์—ญ์‹œ ๋ณด์•˜๋‹ค. ๋”ฐ๋ผ์„œ ์ฒซ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž์™€ ๋‘ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž๊ฐ€ ๋ณธ ํฌ๋„๋ผ๋Š” ์•„์ดํ…œ์„ ์ˆ˜๋ฐ•์ด๋ผ๋Š” ์•„์ดํ…œ๊ณผ ์œ ์‚ฌํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ ์ด๋ฅผ ์„ธ ๋ฒˆ์งธ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ถ”์ฒœํ•ด์ฃผ๋Š” ๊ฒƒ์ด๋‹ค.
ย 
(2) ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(User-based collaborative filtering)์—์„œ ? ๊ตฌํ•˜๊ธฐ
notion image
๋ฐ”๋กœ ์•ž์—์„œ ๋งํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ?๋Š” ์–ด๋–ค ํ‰์ ์„ ๊ฐ€์งˆ์ง€ ์˜ˆ์ธกํ•˜์—ฌ, ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•ด ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ผ๋‹จ ๋‚˜์™€ ์žˆ๋Š” ํ‘œ์—์„œ ?๋Š” ๋ฌด์‹œํ•œ ์ฑ„ ์žˆ๋Š” ์ •๋ณด๋“ค์„ ํ†ตํ•ด์„œ Cosine ์œ ์‚ฌ๋„์™€ Pearson ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ๋‹ค. ์ด๋•Œ ์‚ฌ์šฉ์ž3์˜ ์•„์ดํ…œ1์—์„œ ์ด๊ฒƒ์ด ์‚ฌ์šฉ์ž1, ์‚ฌ์šฉ์ž2์™€ ๋น„์Šทํ•œ ์„ ํ˜ธ๋ฅผ ๋ณด์ธ๋‹ค๊ณ  ์ƒ๊ฐํ•ด ์ด๋ฅผ ํ†ตํ•ด ?๋ฅผ ์˜ˆ์ธกํ•ด๋ณด์ž. ์ด๋“ค์˜ ์•„์ดํ…œ1์€ ๊ฐ๊ฐ 7์ ๊ณผ 6์ ์ด๋ฏ€๋กœ ? ์—ญ์‹œ 6~7์ ์˜ ๋†’์€ ์ˆ˜์ค€์„ ๋ณด์ผ ๊ฒƒ์ด๊ณ , ์‚ฌ์šฉ์ž3 ์—ญ์‹œ ์•„์ดํ…œ1์„ ๋งŽ์ด ๋ณผ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.
๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์‚ฌ์šฉ์ž3์˜ ์•„์ดํ…œ6์—์„œ ์ด๊ฒƒ์ด ์‚ฌ์šฉ์ž4, ์‚ฌ์šฉ์ž5์™€ ๋น„์Šทํ•œ ์„ ํ˜ธ๋ฅผ ๋ณด์ธ๋‹ค๊ณ  ์ƒ๊ฐํ•ด ์ด๋ฅผ ํ†ตํ•ด ?๋ฅผ ์˜ˆ์ธกํ•ด๋ณด์ž. ์ด๋“ค์˜ ์•„์ดํ…œ6์€ ๊ฐ๊ฐ 4์ ๊ณผ 3์ ์ด๋ฏ€๋กœ ? ์—ญ์‹œ 3~4์ ์˜ ๋‚ฎ์€ ์ˆ˜์ค€์„ ๋ณด์ผ ๊ฒƒ์ด๊ณ , ์‚ฌ์šฉ์ž3 ์—ญ์‹œ ์•„์ดํ…œ6์„ ์ ๊ฒŒ ๋ณผ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.
ย 
(3) ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(User-based collaborative filtering)์˜ ๋ฌธ์ œ์  ๋ฐ ํ•ด๊ฒฐ ๋ฐฉ์•ˆ
๋‹ค๋งŒ ์ด๋Ÿฌํ•œ ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์—์„œ๋„ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ๋ถ€๋ถ„์ด ์žˆ๋‹ค. ์œ„์˜ ํ‘œ์—์„œ ์‚ฌ์šฉ์ž๋งˆ๋‹ค์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•ด๋ณด๋ฉด ๊ฐ๊ฐ 5.5, 4.8, 2.0, 2.5, 2.0์ด ๋‚˜์˜ค๋Š”๋ฐ ์‚ฌ์šฉ์ž1๊ณผ ์‚ฌ์šฉ์ž2๊ฐ€ ์‚ฌ์šฉ์ž3, ์‚ฌ์šฉ์ž4, ์‚ฌ์šฉ์ž5์— ๋น„ํ•ด์„œ ํ›„ํ•œ ํ‰์ ์„ ์ฃผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, ์‚ฌ์šฉ์ž1๊ณผ ์‚ฌ์šฉ์ž2๊ฐ€ ํ‰์ ์„ ๋†’๊ฒŒ ์ฃผ์—ˆ๋Š”๋ฐ ์ด๊ฒƒ์ด ์ง„์งœ๋กœ ์•„์ดํ…œ์„ ์ข‹๋‹ค๊ณ  ๋А๊ปด์„œ ๋†’์€ ํ‰์ ์„ ์ค€ ๊ฒƒ์ธ์ง€, ์•„๋‹ˆ๋ฉด ํ›„ํ•œ ํ‰์ ์„ ์ฃผ๋Š” ์‚ฌ์šฉ์ž์ด๊ธฐ ๋•Œ๋ฌธ์— ํ‰์ ์„ ๋†’๊ฒŒ ์ฃผ์–ด์„œ ํŽธ์ฐจ๊ฐ€ ์ƒ๊ธด ๊ฒƒ์ธ์ง€์— ๋Œ€ํ•œ ์˜๊ตฌ์‹ฌ์ด ๋“ค ์ˆ˜ ์žˆ๋‹ค.
๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ํŽธํ–ฅ์„ ์ œ๊ฑฐํ•  ํ•„์š”๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์‚ฌ์šฉ์ž ๊ฐ„์˜ ํŽธํ–ฅ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•œ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์€๋ฐ, ์•„์ดํ…œ์˜ ํ‰์ ์—์„œ ์‚ฌ์šฉ์ž์˜ ํ‰๊ท  ํ‰์ ์„ ๋บ€ ๋’ค ํ”ผ์–ด์Šจ ์œ ์‚ฌ๋„๋ฅผ ๊ณฑํ•˜์—ฌ ๊ฐ€์ค‘ํ‰๊ท ์„ ๊ตฌํ•˜๋ฉด ํŽธํ–ฅ์„ ์ œ๊ฑฐํ•œ ํ‰์ ์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
ex) ์‚ฌ์šฉ์ž3์˜ ์•„์ดํ…œ1 ํ‰์ 
์‚ฌ์šฉ์ž3์˜ ์•„์ดํ…œ1 ํ‰์  = ์‚ฌ์šฉ์ž3์˜ ํ‰๊ท  ํ‰์  + [(์‚ฌ์šฉ์ž1์˜ ์•„์ดํ…œ1 ํ‰์  - ์‚ฌ์šฉ์ž1์˜ ํ‰๊ท  ํ‰์ ) โจฏ ์‚ฌ์šฉ์ž1์˜ ํ”ผ์–ด์Šจ ์œ ์‚ฌ๋„ + (์‚ฌ์šฉ์ž2์˜ ์•„์ดํ…œ1 ํ‰์  - ์‚ฌ์šฉ์ž2์˜ ํ‰๊ท  ํ‰์ ) โจฏ ์‚ฌ์šฉ์ž2์˜ ํ”ผ์–ด์Šจ ์œ ์‚ฌ๋„] / [์‚ฌ์šฉ์ž1์˜ ํ”ผ์–ด์Šจ ์œ ์‚ฌ๋„ + ์‚ฌ์šฉ์ž2์˜ ํ”ผ์–ด์Šจ ์œ ์‚ฌ๋„]
ย 
(4) ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(Item-based collaborative filtering)
์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์ด๋ž€ ๋ง๊ทธ๋Œ€๋กœ ์•„์ดํ…œ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ์œ ์‚ฌ๋„์˜ ํŽธํ–ฅ์„ ์ œ๊ฑฐํ•˜์—ฌ ๊ฐ€์ค‘ํ‰๊ท ์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์ด ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ ๋‹ค๋ฅธ ์ ์€ ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์ด ์‚ฌ์šฉ์ž ๊ฐ„์˜ ๋น„๊ต๋ฅผ ํ–ˆ๋‹ค๋ฉด, ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ์—ฌ๋Ÿฌ ์‚ฌ์šฉ์ž ๋ฌถ์Œ์—์„œ ์•„์ดํ…œ์„ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋‚˜์˜ ์•„์ดํ…œ๊ณผ ๋‹ค๋ฅธ ์•„์ดํ…œ์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ๋Š” ๊ฒƒ์ด๋‹ค.
ย 
(5) ์ตœ๊ทผ์ ‘ ์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์˜ ์žฅ์ ๊ณผ ๋‹จ์ 
์žฅ์ 
  • ์ ‘๊ทผ ๋ฐฉ์‹์ด ๊ฐ„๋‹จํ•˜๊ณ  ์ง๊ด€์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ตฌํ˜„ ๋ฐ ๋””๋ฒ„๊ทธ๊ฐ€ ์‰ฝ๋‹ค.
  • ๋น„์Šทํ•œ ์‚ฌ์šฉ์ž๋ฅผ ๋จผ์ € ์ฐพ์€ ๋’ค์— ์ด ๋น„์Šทํ•œ ์‚ฌ์šฉ์ž๋ฅผ ๋ช‡ ๋ช…์ด๋‚˜ ์„ ํƒํ• ์ง€ K๋ฅผ ํ†ตํ•ด ์ •ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์•„์ดํ…œ ์ถ”์ฒœ์˜ ์ด์œ ๊ฐ€ ์ •๋‹นํ•˜๋‹ค.
  • ๊ฐ™์€ ์ด์œ ๋กœ ์•„์ดํ…œ ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง„๋‹ค.
  • ์ถ”์ฒœ ๋ฆฌ์ŠคํŠธ์— ์ƒˆ๋กœ์šด ์•„์ดํ…œ๊ณผ ์‚ฌ์šฉ์ž๊ฐ€ ์ถ”๊ฐ€๋˜๋”๋ผ๋„ ์ƒ๋Œ€์ ์œผ๋กœ ๋ชจ๋ธ์ด ํฌ๊ฒŒ ๋ฐ”๋€Œ์ง€ ์•Š์•„ ์•ˆ์ •์ ์ด๋‹ค.
๋‹จ์ 
  • ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์—์„œ ์‹œ๊ฐ„๊ณผ ์†๋„, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋งŽ์ด ํ•„์š”ํ•˜๋‹ค.
  • ํฌ์†Œ์„ฑ ๋•Œ๋ฌธ์— ์ œํ•œ ๋ฒ”์œ„๊ฐ€ ์ƒ๊ธด๋‹ค. ์ฆ‰, ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์ด ๋ณด๋Š” ์ƒํ’ˆ์€ ๋งŽ์ด ๋ณด๊ณ  ์ ๊ฒŒ ๋ณด๋Š” ์ƒํ’ˆ์€ ์ ๊ฒŒ ๋ด์„œ ๋งŽ์ด ๋ณด๋Š” ์ƒํ’ˆ ์œ„์ฃผ๋กœ ์ถ”์ฒœ์ด ์ง„ํ–‰๋  ์ˆ˜๋ฐ–์— ์—†์œผ๋ฉฐ, ์–ด๋–ค ์•„์ดํ…œ์— ๋Œ€ํ•ด์„œ ์•„๋ฌด๋„ ํ‰๊ฐ€๋ฅผ ๋‚ด๋ฆฌ์ง€ ์•Š๋Š”๋‹ค๋ฉด ๊ทธ ์•„์ดํ…œ์— ๋Œ€ํ•ด์„œ๋Š” ํ‰์  ์˜ˆ์ธก์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.
โžก๏ธย ๋”ฐ๋ผ์„œ ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ์ถ”์ฒœ ์‹œ์Šคํ…œ์„ ํ•จ๊ป˜ ํ™œ์šฉํ•ด์•ผ ํ•œ๋‹ค!
ย 

4) ์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(Latent Factor Collaborative Filtering)

(1) ์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง๊ณผ์˜ ์ฐจ์ด์ ์œผ๋กœ ๋ณธ ์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์˜ ์ •์˜์™€ ์›๋ฆฌ
์ด์›ƒ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ์•„์ดํ…œ์˜ ๋ฒกํ„ฐ์™€ ์‚ฌ์šฉ์ž ์ŠคํŽ˜์ด์Šค์˜ ๋ฒกํ„ฐ ๊ฐ„์˜ ์กฐํ•ฉ์„ ํ†ตํ•ด ์•„์ดํ…œ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ํ†ตํ•ด ์•„์ดํ…œ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถ”์ฒœํ•˜๊ฑฐ๋‚˜ ์‚ฌ์šฉ์ž ๊ฐ„์˜ ๋ฒกํ„ฐ ์œ ์‚ฌ๋„๋ฅผ ํ†ตํ•ด ์ถ”์ฒœ์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ด์™€ ๋‹ฌ๋ฆฌ ์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ์‚ฌ์šฉ์ž ๊ฐ„์˜ ์ŠคํŽ˜์ด์Šค์™€ ์•„์ดํ…œ ์ŠคํŽ˜์ด์Šค ๋‘ ๊ฐ€์ง€๋ฅผ ๋งŒ๋“ค๊ณ  ์ด๋“ค์˜ ๊ณฑ์„ ํ†ตํ•ด ์ถ”์ฒœ์„ ์ง„ํ–‰ํ•œ๋‹ค.
์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ์‚ฌ์šฉ์ž ๋งคํŠธ๋ฆญ์Šค์™€ ์•„์ดํ…œ ๋งคํŠธ๋ฆญ์Šค๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ํ–‰๋ ฌ์„ ๋„์ž…ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ๊ฐ๊ฐ์˜ ์š”์ธ๋“ค์ด ์ •ํ™•ํ•˜๊ฒŒ ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”์ง€ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— โ€˜์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜โ€™ ํ˜‘์—… ํ•„ํ„ฐ๋ง์ด๋ผ๊ณ  ํ•˜๋ฉฐ ๊ฐ ์‚ฌ์šฉ์ž์˜ latent matrix์™€ ์•„์ดํ…œ์˜ latent matrix๋ฅผ ๊ณฑํ–ˆ์„ ๋•Œ ํ‰์  ๋งคํŠธ๋ฆญ์Šค๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋‹ค.
์ž ์žฌ ์š”์ธ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง์˜ ์›๋ฆฌ๋กœ๋Š” ๋„ทํ”Œ๋ฆญ์Šค์—์„œ ์‚ฌ์šฉํ•˜๋Š” SVD, ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋Š” Weighted ๋“ฑ์ด ์žˆ์œผ๋‚˜ ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” SGD์™€ ALS์— ๋Œ€ํ•ด์„œ ๋‹ค๋ค„๋ณด๊ณ ์ž ํ•œ๋‹ค.
ย 
(2) SGD์˜ ์ •์˜
notion image
SGD๋Š” ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด๋ฅผ ํ†ตํ•ด ํ–‰๋ ฌ์„ ๋Œ€๊ฐํ™”ํ•˜๋Š” ๋ฐฉ์‹์„ ์˜๋ฏธํ•˜๋ฉฐ ๊ธฐ์กด์˜ ํ‰์  ๋งคํŠธ๋ฆญ์Šค์™€ ์‚ฌ์šฉ์ž/์•„์ดํ…œ์— ๋Œ€ํ•œ latent matrix์˜ ๊ณฑ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜ํƒ€๋‚ฌ๋‹ค. SGD๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉ์ž latent์™€ ์•„์ดํ…œ latent๋ฅผ ๊ณฑํ–ˆ์„ ๋•Œ ํ‰์  ๋งคํŠธ๋ฆญ์Šค๋ฅผ ๋ณต์›ํ•˜์—ฌ ์‹ค์ œ ํ‰์ ๊ณผ์˜ ์ฐจ์ด๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” U์™€ V๋ฅผ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. U๋ฅผ ํŽธ๋ฏธ๋ถ„ํ•˜๋ฉด V์— ๋Œ€ํ•œ ํ•จ์ˆ˜๊ฐ€ ๋„์ถœ๋˜๊ณ , V๋ฅผ ํŽธ๋ฏธ๋ถ„ํ•˜๋ฉด U์— ๋Œ€ํ•œ ํ•จ์ˆ˜๊ฐ€ ๋„์ถœ๋˜๋ฉฐ U์™€ V๊ฐ€ ๊ณ„์† ์—…๋ฐ์ดํŠธ๋˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์ณ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๊ณ„์† ๋ณ€๋™ํ•˜๊ฒŒ ๋œ๋‹ค. ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์œ„์˜ ์˜ˆ์‹œ์™€ ๊ฐ™๋‹ค.
๊ทธ๋ฆฌ๊ณ  SGD์—๋Š” Regularization์ด๋ผ๋Š” ์ค‘์š”ํ•œ ๋ฐฉ๋ฒ•์ด ํ™œ์šฉ๋œ๋‹ค. ์ด๋Š” ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด์™€ ๊ฐ™์€ ํ–‰๋ ฌ์„ ๋Œ€๊ฐํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ๋ฐ, Weight ๊ฐ’์—์„œ Regularization์ด ์—†์œผ๋ฉด ๊ฐ’์ด ํญ๋ฐœ์ ์œผ๋กœ ์ฆ๊ฐ€ํ•  ์šฐ๋ ค๊ฐ€ ์žˆ์–ด ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ๊ฐ ํฌ๊ธฐ์˜ ์ œ๊ณฑ์ธ Regularization term์„ ๋”ํ•ด์ค€๋‹ค.
ย 
(3) SGD์˜ ์žฅ์ ๊ณผ ๋‹จ์ 
์žฅ์ 
๋งค์šฐ ์œ ์—ฐํ•˜๋ฉฐ, ๋”ฅ๋Ÿฌ๋‹์˜ ๋ชจ๋“  ์žฅ์ ์„ ๋‘๋ฃจ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.
๋‹จ์ 
์ˆ˜๋ ด ์†๋„๊ฐ€ ๋А๋ฆฌ๋‹ค. ๋‹ค๋งŒ, ์ข‹์€ ๋”ฅ๋Ÿฌ๋‹์„ ์“ฐ๋ฉด ์ˆ˜๋ ด ์†๋„๊ฐ€ ์–ด๋А์ •๋„ ํšŒ๋ณต๋˜๋ฉฐ parallelized๋กœ ๋ถ„์„ํ•ด์„œ ์“ฐ๋ฉด ๋” ๋น ๋ฅด๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋‹ค.
ย 
(4) ALS์˜ ์ •์˜์™€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน์ง•
ALS๋Š” ๋‘ ํ–‰๋ ฌ ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ์ •ํ•˜๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜์˜ ํ–‰๋ ฌ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ํ•˜๋‚˜์˜ latent๋ฅผ ๊ณ ์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ๋ฌด์กฐ๊ฑด convexํ•œ ํ˜•ํƒœ์ด๊ณ , ๋ฌด์กฐ๊ฑด ํ–‰๋ ฌ์— ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
์‚ฌ์šฉ์ž๋ฅผ ๊ณ ์ •ํ•˜๊ณ  ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์•„์ดํ…œ์˜ ํ–‰๋ ฌ์„ ๊ณ ์ •ํ•˜๊ณ  ์‚ฌ์šฉ์ž์˜ ํ–‰๋ ฌ์„ ์ตœ์ ํ™”ํ•˜๊ฑฐ๋‚˜, ์‚ฌ์šฉ์ž์˜ ํ–‰๋ ฌ์„ ๊ณ ์ •ํ•˜๊ณ  ์•„์ดํ…œ์˜ ํ–‰๋ ฌ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ์‹์„ ๋ฐ˜๋ณตํ•ด์„œ ์ง„ํ–‰ํ•œ๋‹ค. ์ด ๊ณผ์ •์—์„œ ์‹์ด ๋ชจ๋‘ convex์˜ ํ˜•ํƒœ๋กœ ๋ฐ”๋€Œ๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜๋ ด๋œ ํ–‰๋ ฌ์˜ ์ •๋‹ต์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ์•ž์—์„œ๋Š” ?๋ฅผ ๋ชจ๋‘ ๋น„์šฐ๊ณ  ๊ณ„์‚ฐ์„ ์ง„ํ–‰ํ–ˆ์œผ๋‚˜ ALS์—์„œ๋Š” ?๋ฅผ ๋ชจ๋‘ 0์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ํ•™์Šตํ•œ๋‹ค๋Š” ํŠน์ด์ ์ด ์žˆ๊ณ  ๊ณ ์ •์‹œํ‚ค๋Š” ๋ถ€๋ถ„ ์™ธ์—๋Š” SGD์™€ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š์€ ๋ฐฉ์‹์ด๋‹ค.
ย 

ย 

SGD ์ฝ”๋“œ

import numpy as np from tqdm import tqdm_notebook as tqdm import numpy as np # Base code : https://yamalab.tistory.com/92 class MatrixFactorization(): def __init__(self, R, k, learning_rate, reg_param, epochs, verbose=False): """ :param R: rating matrix :param k: latent parameter :param learning_rate: alpha on weight update :param reg_param: beta on weight update :param epochs: training epochs :param verbose: print status """ self._R = R # ํ‰์  ํ–‰๋ ฌ self._num_users, self._num_items = R.shape self._k = k # user latent์™€ item latent์˜ ์ฐจ์› ์ˆ˜ self._learning_rate = learning_rate # ํ•™์Šต๋ฅ  self._reg_param = reg_param # weight์˜ regularization ๊ฐ’ self._epochs = epochs # ์ „์ฒด ํ•™์Šต ํšŸ์ˆ˜ self._verbose = verbose # ํ•™์Šต ๊ณผ์ •์„ ์ถœ๋ ฅํ• ์ง€ ์—ฌ๋ถ€ def fit(self): """ training Matrix Factorization : Update matrix latent weight and bias ์ฐธ๊ณ : self._b์— ๋Œ€ํ•œ ์„ค๋ช… - global bias: input R์—์„œ ํ‰๊ฐ€๊ฐ€ ๋งค๊ฒจ์ง„ rating์˜ ํ‰๊ท ๊ฐ’์„ global bias๋กœ ์‚ฌ์šฉ - ์ •๊ทœํ™” ๊ธฐ๋Šฅ. ์ตœ์ข… rating์— ์Œ์ˆ˜๊ฐ€ ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ ๋Œ€์‹  latent feature์— ์Œ์ˆ˜๊ฐ€ ํฌํ•จ๋˜๋„๋ก ํ•ด์คŒ. :return: training_process """ # latent matrix ์ดˆ๊ธฐํ™” self._P = np.random.normal(size=(self._num_users, self._k)) self._Q = np.random.normal(size=(self._num_items, self._k)) # bias ์ดˆ๊ธฐํ™” self._b_P = np.zeros(self._num_users) self._b_Q = np.zeros(self._num_items) self._b = np.mean(self._R[np.where(self._R != 0)]) # train while epochs self._training_process = [] for epoch in range(self._epochs): # rating์ด ์กด์žฌํ•˜๋Š” index๋ฅผ ๊ธฐ์ค€์œผ๋กœ training xi, yi = self._R.nonzero() for i, j in zip(xi, yi): self.gradient_descent(i, j, self._R[i, j]) cost = self.cost() self._training_process.append((epoch, cost)) # epoch์™€ cost๋ฅผ ์ €์žฅํ•˜๋Š” ๋ถ€๋ถ„ # print status if self._verbose == True and ((epoch + 1) % 10 == 0): print("Iteration: %d ; cost = %.4f" % (epoch + 1, cost)) def cost(self): """ compute root mean square error :return: rmse cost """ # xi, yi: R[xi, yi]๋Š” nonzero์ธ value๋ฅผ ์˜๋ฏธํ•œ๋‹ค. # ์ฐธ๊ณ : http://codepractice.tistory.com/90 xi, yi = self._R.nonzero() # predicted = self.get_complete_matrix() cost = 0 for x, y in zip(xi, yi): cost += pow(self._R[x, y] - self.get_prediction(x, y), 2) return np.sqrt(cost/len(xi)) def gradient(self, error, i, j): """ gradient of latent feature for GD :param error: rating - prediction error :param i: user index :param j: item index :return: gradient of latent feature tuple """ dp = (error * self._Q[j, :]) - (self._reg_param * self._P[i, :]) dq = (error * self._P[i, :]) - (self._reg_param * self._Q[j, :]) return dp, dq # user latent matrix์™€ item latent matrix์˜ ๊ณฑ์„ ํ†ตํ•ด ํ‰์  ํ–‰๋ ฌ์˜ ๊ฐ’์„ ์ƒ์„ฑ def gradient_descent(self, i, j, rating): """ graident descent function :param i: user index of matrix :param j: item index of matrix :param rating: rating of (i,j) """ # get error prediction = self.get_prediction(i, j) error = rating - prediction # update biases self._b_P[i] += self._learning_rate * (error - self._reg_param * self._b_P[i]) self._b_Q[j] += self._learning_rate * (error - self._reg_param * self._b_Q[j]) # update latent feature dp, dq = self.gradient(error, i, j) self._P[i, :] += self._learning_rate * dp self._Q[j, :] += self._learning_rate * dq def get_prediction(self, i, j): """ get predicted rating: user_i, item_j :return: prediction of r_ij """ return self._b + self._b_P[i] + self._b_Q[j] + self._P[i, :].dot(self._Q[j, :].T) def get_complete_matrix(self): """ computer complete matrix PXQ + P.bias + Q.bias + global bias - PXQ ํ–‰๋ ฌ์— b_P[:, np.newaxis]๋ฅผ ๋”ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ ์—ด๋งˆ๋‹ค bias๋ฅผ ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ - b_Q[np.newaxis:, ]๋ฅผ ๋”ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ ํ–‰๋งˆ๋‹ค bias๋ฅผ ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ - b๋ฅผ ๋”ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ element๋งˆ๋‹ค bias๋ฅผ ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ - newaxis: ์ฐจ์›์„ ์ถ”๊ฐ€ํ•ด์คŒ. 1์ฐจ์›์ธ Latent๋“ค๋กœ 2์ฐจ์›์˜ R์— ํ–‰/์—ด ๋‹จ์œ„ ์—ฐ์‚ฐ์„ ํ•ด์ฃผ๊ธฐ์œ„ํ•ด ์ฐจ์›์„ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ. :return: complete matrix R^ """ return self._b + self._b_P[:, np.newaxis] + self._b_Q[np.newaxis:, ] + self._P.dot(self._Q.T) # run example if __name__ == "__main__": # rating matrix - User X Item : (7 X 5) R = np.array([ [1, 0, 0, 1, 3], [2, 0, 3, 1, 1], [1, 2, 0, 5, 0], [1, 0, 0, 4, 4], [2, 1, 5, 4, 0], [5, 1, 5, 4, 0], [0, 0, 0, 1, 0], ]) # P, Q is (7 X k), (k X 5) matrix
%%time factorizer = MatrixFactorization(R, k=3, learning_rate=0.01, reg_param=0.01, epochs=100, verbose=True) factorizer.fit()
factorizer.get_complete_matrix()