Recommending What Video to Watch Next: A Multitask Ranking System
๐Ÿ‘

Recommending What Video to Watch Next: A Multitask Ranking System

Created
Mar 8, 2022
Editor
Tags
Recommendation System
cleanUrl: "paper/YouTubeRecommendation3"
๐Ÿ“„
๋…ผ๋ฌธ : Recommending What Video to Watch Next: A Multitask Ranking System ์ €์ž : Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi

๋…ผ๋ฌธ ์„ ์ • ๊ณ„๊ธฐ

2016๋…„ ์œ ํŠœ๋ธŒ์—์„œ ๋ฐœํ‘œํ•œ ๋…ผ๋ฌธ์„ ์ฝ๊ณ  ๋‚œ ํ›„ 2019๋…„์—๋Š” ์–ด๋–ป๊ฒŒ ๋ฐœ์ „๋˜์—ˆ๋Š”์ง€ ์–ด๋– ํ•œ ๋ถ€๋ถ„์ด ๋‹ฌ๋ผ์กŒ๋Š”์ง€์— ๋Œ€ํ•ด ๊ถ๊ธˆ์ฆ์— ์ƒ๊ฒจ ์ด ๋…ผ๋ฌธ์„ ์„ ์ •ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํ˜„์žฌ ์กด์žฌํ•˜๋Š” ์œ ํŠœ๋ธŒ ๊ด€๋ จ ๋…ผ๋ฌธ ์ค‘ ๊ฐ€์žฅ ์ตœ์‹ ์ด๊ณ  ๋ฐœ์ „๋œ ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์–ด ํ•จ๊ป˜ ๊ณต์œ ํ•˜๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์•„ ๊ฐ€์ ธ์˜ค๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ย 

Introduction

์ด์ „ 2016๋…„ ๋…ผ๋ฌธ์—์„œ๋„ ์„ค๋ช…ํ–ˆ๋“ฏ์ด ์œ ํŠœ๋ธŒ์˜ ์ถ”์ฒœ์‹œ์Šคํ…œ์€ candidate generation โ†’ ranking 2๋‹จ๊ณ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ค‘ 2019๋…„ ๋…ผ๋ฌธ์€ candidate generation์— ์ง‘์ค‘ํ–ˆ๋˜ 2016๋…„ ๋…ผ๋ฌธ๊ณผ ๋‹ฌ๋ฆฌ ranking ๋ชจ๋ธ์— ์ง‘์ค‘ํ•ด ์–ด๋–ป๊ฒŒ ํ’€์–ด๋‚˜๊ฐ€๊ณ  ์žˆ๋Š”์ง€ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ์–ด๋–ค ๊ฒƒ์„ ์ตœ์ ํ™” ํ•  ๊ฒƒ์ธ์ง€์— ๋Œ€ํ•œ ๋ฌธ์ œ์™€ ๋†’์€ ์ˆœ์œ„์— ์žˆ์–ด ์‹œ์ฒญ์„ ํ–ˆ์„ ๊ฐ€๋Šฅ์„ฑ ๋˜ํ•œ ๋ฐฐ์ œํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์œ„์—์„œ ๋งํ•œ 2๊ฐ€์ง€ ๋‚ด์šฉ์„ ์š”์•ฝํ•˜๋ฉด objective๋ฅผ ์ž˜ ์„ค์ •ํ•˜๋Š” ๊ฒƒ ๊ทธ๋ฆฌ๊ณ  ์ถ”์ฒœ์˜ bias๋ฅผ ์ž˜ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ํŠœ๋ธŒ์—์„œ๋Š” multitask neural network ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค.
notion image
์ด ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋Š” Multi-gate Mixture-of-Experts(MMoE)๋ฅผ ์ ์šฉํ•œ ๊ฒƒ์œผ๋กœ objective๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ multitask learning์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์•ž์„  ๋…ผ๋ฌธ์—์„œ ๋งํ–ˆ๋“ฏ์ด ๋‹ค์Œ์— ์‹œ์ฒญํ•  ์˜์ƒ๋งŒ์„ objective๋กœ ํ•˜์—ฌ ์˜ˆ์ธกํ•œ๋‹ค๋ฉด ๊ด‘๊ณ ๋‚˜ ๋‚š์‹œ์„ฑ ์˜์ƒ์— ์ด๋Œ๋ฆด ์ˆ˜ ์žˆ๋Š” ํ™•๋ฅ ์ด ๋†’์•„์ง€๊ธฐ ๋•Œ๋ฌธ์— โ€˜์–ผ๋งˆ๋‚˜ ์‹œ์ฒญํ• ์ง€โ€™, โ€˜์„ ํ˜ธํ•˜๋Š”์ง€โ€™ ๋“ฑ์˜ objective๋ฅผ ๋ถ„๋ฆฌํ•˜๋ฉด ์˜ˆ์ธก์˜ ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. objective๋Š” ์ด 2๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ์œ ์ €์˜ ํด๋ฆญ, ์˜์ƒ์— ๋Œ€ํ•œ engagement๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” โ€˜engagement objectiveโ€™์™€ ์œ ์ €๊ฐ€ ์˜์ƒ์„ ์ข‹์•„ํ–ˆ๋Š”์ง€์— ๊ด€ํ•œ โ€˜satisfaction objectiveโ€™๋กœ ๋‚˜๋ˆ„์–ด์ง‘๋‹ˆ๋‹ค.
ย 

Related Works

1. Industrial Recommendation Systems

์ถ”์ฒœ์‹œ์Šคํ…œ์—์„œ ์ค‘์š”ํ•œ ์ ์„ ๊ณ ๋ฅด๋ผ๊ณ  ํ•˜๋ฉด 3๊ฐ€์ง€๊ฐ€ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ ์ค‘ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ์ด์ „ ๋…ผ๋ฌธ์—๋„ ๋‚˜์™”๋“ฏ์ด implicit ํ”ผ๋“œ๋ฐฑ์„ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. explicit ํ”ผ๋“œ๋ฐฑ์€ ํ˜„์‹ค์ ์œผ๋กœ ์–ด๋ ต๊ณ  ํฌ๊ฒŒ ์˜๋ฏธ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ค‘์š”ํ•œ ๊ฒƒ์€ stage๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. candidate generation โ†’ ranking 2๋‹จ๊ณ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š” ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ๋Š” scalability๋กœ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
ย 

2. Modeling Biases in Training Data

์œ ํŠœ๋ธŒ์—์„œ๋Š” ์ถ”์ฒœ์‹œ์Šคํ…œ์ด ์ถ”์ฒœํ•ด์ค€ ์˜์ƒ์„ ์œ ์ €๊ฐ€ ํด๋ฆญํ•˜๊ฒŒ ๋˜๋ฉด ์ด๋ฅผ ๋˜ ์ถ”์ฒœ์‹œ์Šคํ…œ์ด ํ•™์Šตํ•˜๊ฒŒ ๋˜๋Š” feedback loop๊ฐ€ ๋ฐœ์ƒํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ๋ฌธ์— ์œ ์ €์™€ ์ถ”์ฒœ์‹œ์Šคํ…œ ์‚ฌ์ด์—์„œ๋Š” selection bias๊ฐ€ ํ•„์ˆ˜์ ์œผ๋กœ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋†’์€ ์ˆœ์œ„์— ์œ„์น˜ํ•œ ์˜์ƒ์˜ ๊ฒฝ์šฐ ํ•™์Šต ์‹œ ๊ฐ€์ค‘์น˜๋ฅผ ์ผ๋ถ€๋Ÿฌ ๋‚ฎ์ถ”๋Š” ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
ย 

Model Architecture

1. Ranking Objectives

ranking ๋ชจ๋ธ์€ MMoE๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ objective๋ฅผ 2๊ฐ€์ง€๋กœ ๋ถ„๋ฆฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” engagement์— ๊ด€ํ•œ ๊ฒƒ์œผ๋กœ ์œ ์ €์˜ ํด๋ฆญ(binary classification task)๊ณผ ์‹œ์ฒญ ์‹œ๊ฐ„(regression task)์œผ๋กœ ๋‚˜๋ˆ„์–ด์ง‘๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‘ ๋ฒˆ์งธ๋Š” satisfaction์— ๊ด€ํ•œ ๊ฒƒ์œผ๋กœ ์ข‹์•„์š”(binary classification task)์™€ rating(regression task)์™€ ๊ด€ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  multiple objectives๋ฅผ combined score๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
notion image
ย 

2. Modeling and Removing Position and Selection Biases

notion image
์œ„ ๊ทธ๋ฆผ์€ selection bias์— ๊ด€ํ•œ ๊ฒƒ์œผ๋กœ serving์‹œ์— missing value๋กœ ํ†ต๊ณผํ•ด ๋†’์€ ์ˆœ์œ„์— ํŽ˜๋„ํ‹ฐ๋ฅผ ์ฃผ๋Š” ๋ฐฉ์‹์„ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ถ”์ฒœ ๋žญํ‚น ์ˆœ์œ„๋ฅผ feature๋กœ ํ™œ์šฉํ•œ ๊ฒƒ๊ณผ ๋‹ค๋ฅธ feature ๊ฐ’์„ linear combinationํ•˜์—ฌ selection bias๋กœ ๋งŒ๋“ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
ย 

Results

๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๋Œ€ํ•ด MMoE๋ฅผ ์ ์šฉ ํ•œ ๊ฒƒ๊ณผ ์•ˆํ•œ ๊ฒƒ์˜ ์„ฑ๋Šฅ ๋Œ€์กฐ, Expert utilization์— ๋Œ€ํ•œ ์‹œ๊ฐํ™”(Gating network distribution), wide feature(position bias)์™€ ๊ด€๋ จ๋œ CTR ๋Œ€์กฐ๋ฅผ ๋‚˜ํƒ€๋‚ด์—ˆ์Šต๋‹ˆ๋‹ค.
ย 

1. YouTube live experiment results for MMoE

notion image
์œ„ ํ‘œ๋ฅผ ํ†ตํ•ด wide and deep ๊ธฐ๋ฐ˜์˜ shared bottom network๋งŒ์„ ์ ์šฉํ•œ ๊ฒƒ ๋ณด๋‹ค experts๋ฅผ ์ถ”๊ฐ€ํ•œ MMoE๊ฐ€ ์„ฑ๋Šฅ์ด ๋” ์ข‹์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ experts๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

2. Expert utilization for multiple tasks on YouTube

notion image
์œ„ ํ‘œ๋Š” Gating network์—์„œ softmax layer์˜ probability๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํŠน์ • task๊ฐ€ ํŠน์ • expert๋ฅผ ์„ ํ˜ธํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

๊ฒฐ๊ณผ ๋ฐ ์˜์˜

์œ ํŠœ๋ธŒ๋Š” ๊ฐœ์ธํ™” ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋„์ž…ํ•˜๋ฉด์„œ ์—ฌ๋Ÿฌ ๋‚œ๊ด€์— ๋ด‰์ฐฉ์„ ํ–ˆ๋Š”๋ฐ ๊ทธ ์ค‘ ์ด๋ฒˆ ๋…ผ๋ฌธ์—์„œ๋Š” engagement์™€ satisfaction, ๊ทธ๋ฆฌ๊ณ  bias ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฃฐ ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•ด ๊นŠ์ด ๋‹ค๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. engagement์˜ ๊ฒฝ์šฐ ํด๋ฆญ๊ณผ ๋”๋ถˆ์–ด ์‹œ์ฒญ์‹œ๊ฐ„์„ ๋ฐ˜์˜ํ•ด ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ–ˆ์œผ๋ฉฐ satisfaction์˜ ๊ฒฝ์šฐ ์ข‹์•„์š”์™€ ๋ณ„์  ์—ฌ๋ถ€๋ฅผ ํ†ตํ•ด ์œ ์ €๊ฐ€ ์ง์ ‘์ ์œผ๋กœ ์ฃผ๋Š” ํ”ผ๋“œ๋ฐฑ๋งŒ์„ ์‚ฌ์šฉํ•ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  bias์˜ ๊ฒฝ์šฐ ์ƒ์œ„ ๋žญํ‚น๋œ ์˜์ƒ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‚ฎ๊ฒŒ ์ฃผ์–ด ๊ณ„์†์ ์œผ๋กœ ํ•™์Šต๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜์ง€ ์•Š๋„๋ก ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์œ ํŠœ๋ธŒ๋Š” ๋” ๋‚˜์€ ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋…ธ๋ ฅํ–ˆ๊ณ  ์•ž์œผ๋กœ๋„ ์ง€๊ธˆ ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ์‹œํ‚ฌ ๋˜ ๋‹ค๋ฅธ ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋“ฑ์žฅํ•˜์ง€ ์•Š์„๊นŒ ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค.
ย 

์ด์ „ ๊ธ€ ์ฝ๊ธฐ

โค๏ธ
Deep Neural Networks for YouTube Recommendations