GloVe: Global Vectors for Word Representation
๐Ÿงค

GloVe: Global Vectors for Word Representation

Created
Feb 6, 2022
Editor
Tags
NLP
cleanUrl: "/paper/glove"
๐Ÿ“„
๋…ผ๋ฌธ : GloVe: Global Vectors for Word Representation ์ €์ž : Jeffrey Pennington, Richard Socher, Christopher D.Manning

๋…ผ๋ฌธ ์„ ์ • ์ด์œ 

๋ณธ ๋…ผ๋ฌธ์€ Matrix factorization๊ณผ local context window ๋ฐฉ์‹์˜ ์žฅ์ ๋งŒ์„ ์ฐจ์šฉํ•œ ๋ชจ๋ธ์ธ GloVe๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด์ „์— ๊ณต๋ถ€ํ–ˆ๋˜ Word2vec์˜ ๊ฒฝ์šฐ window ๋‹จ์œ„์˜ ํ•™์Šต์œผ๋กœ ๋‹จ์–ด๋ฅผ ํ‘œํ˜„ํ•˜๊ฑฐ๋‚˜ ์œ ์ถ”ํ•˜๋Š”๋ฐ์—๋Š” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ ์ „์ฒด์ ์ธ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ด์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด์—, GloVe๊ฐ€ word2vec์˜ ์–ด๋–ค ์ ์„ ์–ด๋–ป๊ฒŒ ๊ฐœ์„ ์‹œํ‚ค๋ ค ํ–ˆ๋Š”์ง€ ์•Œ์•„๋ณด๊ณ ์ž ๋ณธ ๋…ผ๋ฌธ์„ ์„ ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Introduction

์˜๋ฏธ ๋ฒกํ„ฐ ๊ณต๊ฐ„ ๋ชจ๋ธ์—์„œ ๊ฐ ๋‹จ์–ด์˜ ์˜๋ฏธ๋Š” ๋ฒกํ„ฐ๋กœย ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. ์ด์ „์˜ ๋ฒกํ„ฐ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹จ์–ด ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋‚˜ ๊ฐ์œผ๋กœ ๋‹จ์–ด๋ฅผ ํ‘œํ˜„ํ–ˆ์ง€๋งŒ Word2Vec์—์„œ ๋‹ค์–‘ํ•œ ์ฐจ์›์˜ ์ฐจ์ด๋กœ ๋‹จ์–ด๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด์„œ "King is to queen as man is to woman" ์—์„œ king - queen = man - woman์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
ย 
๋‹จ์–ด ๋ฒกํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋‘ ๋ฉ”์ธ ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

1) global matrix factoriztion methods (ex. LSA)

LSA๋Š” ๋‹จ์–ด์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์นด์šดํŠธํ•ด์„œ ๊ตฌํ•œ ๋‹จ์–ด-๋ฌธ๋งฅ ํ–‰๋ ฌ์— ์ฐจ์›์ถ•์†Œํ•ด, ์ „์ฒด ์ฝ”ํผ์Šค์˜ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ์ด๋Œ์–ด๋‚ด๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.๋‹จ์–ด์˜ ๋นˆ๋„์— ๊ธฐ๋ฐ˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์œ ์‚ฌ๋„ ์ธก์ •์—์„œ ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š๋‹ค๋Š” ๋‹จ์ ์ด ์ž‡์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด "or" , "the"์™€ ๊ฐ™์€ ๋‹จ์–ด๋Š” ๋นˆ๋„๊ฐ€ ๋งŽ์•„ ์œ ์‚ฌ๋„ ์ธก์ •์— ํฐ ์˜ํ–ฅ์„ ์ฃผ์ง€๋งŒ ์˜๋ฏธ์  ์—ฐ๊ด€์€ ๊ฑฐ์˜ ์—†์Šต๋‹ˆ๋‹ค.
ย 

2) local context window methids (ex. Skip-gram)

Skip gram์€ local context ๋‚ด์—์„œ ์ค‘์‹ฌ ๋‹จ์–ด๋ฅผ ํ†ตํ•ด ์ฃผ๋ณ€ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธก์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ„์˜ ์„ ํ˜• ๊ด€๊ณ„๋กœ ์–ธ์–ด ํŒจํ„ด์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์œ ์‚ฌ์„ฑ ์ธก์ •์— ์žˆ์–ด์„œ LSA๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋” ์ข‹์ง€๋งŒ ์œˆ๋„์šฐ ๋‚ด์˜ ์ฃผ๋ณ€ ๋‹จ์–ด๋กœ ํ•™์Šตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ์ฝ”ํผ์Šค์˜ ํ†ต๊ณ„ ์ •๋ณด(statistical information)๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
์ด ๋‘ ๋ชจ๋ธ์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ณ  ๊ฒฐํ•ฉํ•œ ๊ฒƒ์ด Glove ์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ global word-word co-occurrence counts์—์„œ ํ†ต๊ณ„๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ๊ตฌ์ฒด์ ์ธ ๊ฐ€์ค‘ ์ตœ์†Œ ์ œ๊ณฑ ๋ชจํ˜•์„ ์ œ์‹œํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ ์ „์ฒด ์ฝ”ํผ์Šค์˜ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ๋ฐ˜์˜ํ•˜๋ฉด์„œ ๋†’์€ ์„ฑ๋Šฅ์˜ ์œ ์‚ฌ๋„ ์ธก์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์‹œํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.
ย 

The GloVe Model

์ฝ”ํผ์Šค(corpus)์—์„œ์˜ ๋‹จ์–ด ์ถœํ˜„ ํ†ต๊ณ„๋Š” ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต์—์„œ ์ค‘์š”ํ•œ ์ •๋ณด์ž…๋‹ˆ๋‹ค.ย ์ด์ „์—๋„ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ชจ๋ธ์€ ์žˆ์—ˆ์ง€๋งŒ ์–ด๋–ป๊ฒŒ ์ด๋Ÿฌํ•œ ํ†ต๊ณ„๋กœ ๋ถ€ํ„ฐ ์˜๋ฏธ๊ฐ€ ๋งŒ๋“ค์–ด์กŒ๋Š”์ง€,ย ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ป๊ฒŒ ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ€ ์ด๋Ÿฌํ•œ ์˜๋ฏธ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š”์ง€์— ๋Œ€ํ•œ ์˜๋ฌธ์€ ๋‚จ์•„์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ƒˆ๋กœ์šด word representation model์ธย GloVe๋ฅผ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. Glove๋Š” ์ „์—ญ์ ์ธ ์ฝ”ํผ์Šค(global corpus)์˜ ํ†ต๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ  co-occurrence probability

X : ๋‹จ์–ด-๋‹จ์–ด ๋™์‹œ ๋“ฑ์žฅ ๋นˆ๋„ ํ–‰๋ ฌ (matrix of word-word co-occurrence counts) X_ij : X ํ–‰๋ ฌ์˜ ๊ธฐ๋ณธ๋‹จ์œ„, ๋‹จ์–ด i ๋ฌธ๋งฅ์—์„œ j ๋‹จ์–ด๊ฐ€ ๋“ฑ์žฅํ•œ ํšŸ์ˆ˜ X_i : ๋‹จ์–ด i ๋ฌธ๋งฅ์—์„œ ๋‹จ์–ด๊ฐ€ ๋“ฑ์žฅํ•œ ํšŸ์ˆ˜
ย 
GloVe๋Š” ๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ ๋กœ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์–ด๋–ป๊ฒŒ ํŠน์ • ์˜๋ฏธ๊ฐ€ ๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ ์—์„œ ์ถ”์ถœ๋  ์ˆ˜ ์žˆ๋Š”์ง€ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋‹จ์–ดย i = ice, ๋‹จ์–ดย j = steamย ๋Š” ๋‹จ์–ดย k(various probe words)๋ฅผ ํ†ตํ•ด ์˜๋ฏธ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
notion image
ย 
(1) ice์™€ ๊ด€๋ จ๋œ ๋‹จ์–ด, k = solid
P(k|ice) / P(k|steam)์˜ ๋น„์œจ์€ 8.9๋กœ 1๋ณด๋‹ค ํ›จ์”ฌ ํฐ ๊ฐ’์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. steam ๋ณด๋‹ค ice ๋ฌธ๋งฅ์—์„œ solid๊ฐ€ ๋“ฑ์žฅํ•  ํ™•๋ฅ  ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
(2) steam๊ณผ ๊ด€๋ จ๋œ ๋‹จ์–ด, k = gas
P(k|ice) / P(k|steam)์˜ ๋น„์œจ์€ 0.085๋กœย  1๋ณด๋‹ค ์ž‘์€ ๊ฐ’์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. steam ๋ณด๋‹ค ice ๋ฌธ๋งฅ์—์„œ gas๊ฐ€ ๋“ฑ์žฅํ•  ํ™•๋ฅ  ๋” ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
(3) ice์™€ steam ๋ชจ๋‘์™€ ๊ด€๋ จ๋œ ๋‹จ์–ด, k = water
P(k|ice) / P(k|steam)์˜ ๋น„์œจ์€ 1.36์œผ๋กœ 1์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค. water๊ฐ€ ๊ฐ๊ฐ์˜ ๋ฌธ๋งฅ์—์„œ ๋“ฑ์žฅํ•  ํ™•๋ฅ ์ด ๋น„์Šทํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
(4) ice์™€ steam ๋ชจ๋‘์™€ ๊ด€๋ จ์—†๋Š” ๋‹จ์–ด, k = fashion
P(k|ice) / P(k|steam)์˜ ๋น„์œจ์€ 0.96์œผ๋กœ 1์— ๊ฐ€๊น์Šต๋‹ˆ๋‹ค. water๊ฐ€ ๊ฐ๊ฐ์˜ ๋ฌธ๋งฅ์—์„œ ๋“ฑ์žฅํ•  ํ™•๋ฅ ์ด ๋น„์Šทํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
ย 
์œ„ ํ‘œ์— ๋”ฐ๋ผ ๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ ๋ณด๋‹คย ๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ  ๋น„์œจ์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ, ๊ด€๋ จ์ด ์žˆ๋Š” ๋‹จ์–ด๋ฅผ ๊ตฌ๋ณ„ํ•ด๋‚ด๊ธฐ๊ฐ€ ๋” ์ ํ•ฉํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•ด ๋‹จ์–ด ๋ฒกํ„ฐ๋ฅผ ํ•™์Šตํ•  ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ์šฐ์„  ๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ  ๋น„์œจ์€ ๋‹จ์–ด i, j, ๊ทธ๋ฆฌ๊ณ  k์— ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž„์˜์˜ ํ•จ์ˆ˜ F๋กœ ๋‚˜ํƒ€๋‚ธ ์ผ๋ฐ˜์ ์ธ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. (w : ๋‹จ์–ด ๋ฒกํ„ฐ)
๋ฒกํ„ฐ ๊ณต๊ฐ„์€ ์„ ํ˜•๊ตฌ์กฐ์ด๊ธฐ ๋•Œ๋ฌธ์—ย ๋ฒกํ„ฐ๊ฐ„์˜ ์ฐจ์ด๋กœ ๋‹จ์–ด๊ฐ„์˜ ๊ด€๊ณ„, ์ฆ‰ ๋™์‹œ ๋“ฑ์žฅ ํ™•๋ฅ ์˜ ํฌ๊ธฐ ๊ด€๊ณ„ ๋น„์œจ์„ ๋ฒกํ„ฐ๊ณต๊ฐ„์— ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ขŒ๋ณ€์˜ F ํ•จ์ˆ˜๋Š” ๋ฒกํ„ฐ ๊ฐ’์ด๊ณ  ์šฐ๋ณ€์€ ์ƒ์ˆ˜์ž…๋‹ˆ๋‹ค.
F ๋ฅผ ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•ด ๋ณต์žกํ•œ ํ•จ์ˆ˜๋กœ ๋งŒ๋“ค์ˆ˜๋„ ์žˆ์ง€๋งŒ ๊ทธ๋ ‡๊ฒŒ ํ•˜๋ฉด ์„ ํ˜• ๊ตฌ์กฐ๋ฅผ ๋ถˆ๋ช…๋ฃŒํ•˜๊ฒŒ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„ ํ˜•๊ณต๊ฐ„์—์„œ ๋‹จ์–ด์˜ ์˜๋ฏธ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด์ (dot product) ์‚ฌ์šฉํ•ด ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
๋‹จ์–ด(a word)์™€ ๋ฌธ๋งฅ ๋‹จ์–ด(context word)๊ฐ„์˜ ๊ธฐ์ค€์ด ์ž„์˜์ ์ด๊ธฐ ๋•Œ๋ฌธ์—ย ๊ตํ™˜๊ฐ€๋Šฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (3)์€ ๊ตํ™˜ ๊ฐ€๋Šฅํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋จผ์ € F๋Š” ์ค€๋™์„ฑ(homomorphism)์„ ๋งŒ์กฑ์‹œ์ผœ์•ผํ•ฉ๋‹ˆ๋‹ค. F(a+b) = F(a)F(b)
์ด ์ค€๋™ํ˜•์‹์„ ๋งŒ์กฑ์‹œํ‚ค๋Š” F๋Š” exp ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ (6) ๊ณผ ๊ฐ™์€ ์‹์ด ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค.
(6) ์—์„œ log(X_i)ํ•ญ ๋•Œ๋ฌธ์— ๊ตํ™˜๊ฐ€๋Šฅํ•˜์ง€ ์•Š์€ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. ์ด ํ•ญ์€ k์™€ ๋…๋ฆฝ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— b ํŽธํ–ฅ์ด๋ผ๋Š” ์ƒ์ˆ˜ํ•ญ์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค. ํŽธํ–ฅ์„ ๋”ํ•ด์ฃผ๋ฉด์„œ ๊ตํ™˜๊ฐ€๋Šฅํ•œ ๋Œ€์นญ์ ์ธ ๊ด€๊ณ„๊ฐ€ ์„ฑ๋ฆฝํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
log(X_ik) ํ•ญ์—์„œ X_ik๊ฐ€ 0์ด๋˜๋ฉด ๋ฐœ์‚ฐํ•œ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด log(X_ik +1)๋กœ ๋ณ€ํ™˜ํ•ด X์˜ ํฌ์†Œ์„ฑ(sparsity)๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ๋ฐœ์‚ฐํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ขŒ๋ณ€์€ ๋ฏธ์ง€์ˆ˜์ด๊ณ  ์šฐ๋ณ€์€ ํŠน์ • ์œˆ๋„์šฐ ์‚ฌ์ด์ฆˆ๋ฅผ ๋‘๊ณ  ์ฝ”ํผ์Šค ์ „์ฒด์—์„œ ๋‹จ์–ด๋ณ„ ๋“ฑ์žฅ ๋นˆ๋„๋ฅผ ๊ตฌํ•œ co-occurrence matrix์— ๋กœ๊ทธ๋ฅผ ์ทจํ•ด์ค€ ํ–‰๋ ฌ๋กœ ์šฐ๋ฆฌ๊ฐ€ ์•Œ๊ณ  ์žˆ๋Š” ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ขŒ๋ณ€๊ณผ ์šฐ๋ณ€์˜ ์ฐจ์ด์˜ ์ œ๊ณฑ์ด ์†์‹คํ•จ์ˆ˜๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋ฉด์„œ ์ด ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๊ฒŒ ํ•˜๋Š” w, b๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค.
๋“ฑ์žฅ๋นˆ๋„๊ฐ€ ๋‚ฎ์€ ๊ฐ’์€ ์ •๋ณด์— ๊ฑฐ์˜ ๋„์›€์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฝ”ํผ์Šค์— ๋”ฐ๋ผ Xํ–‰๋ ฌ์—์„œ 0์ธ ๊ฐ’์ด ์ „์ฒด ํ–‰๋ ฌ์˜ 75-95% ์ธ ๊ฒฝ์šฐ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„ ์‹์€ ๊ฑฐ์˜ ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด์—์„œ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ์ค€๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์€ ์œ„ ๋น„์šฉ ํ•จ์ˆ˜์— ๊ฐ€์ค‘์น˜ ํ•จ์ˆ˜, f(X_ij)๋ฅผ ๊ณฑํ•œ ์ƒˆ๋กœ์šด ๊ฐ€์ค‘ ์ตœ์†Œ ์ œ๊ณฑ ํšŒ๊ท€ ๋ชจ๋ธ(weighted least squares regression model)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ตฌํ•˜๊ณ ์ž ํ–ˆ๋˜ ์ตœ์ข… ์†์‹คํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. (V : ๋‹จ์–ด ํฌ๊ธฐ)
ย 
f(x) ๊ฐ€์ค‘์น˜ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ์˜ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
(1) f(0) = 0. ์—ฐ์†ํ•จ์ˆ˜์ด๋ฉด x โ†’ 0 ์ˆ˜๋ ดํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
(2) ๋นˆ๋„๊ฐ€ ์ ์€ co-occurrence์— ๋งŽ์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์ง€ ์•Š๊ธฐ ์œ„ํ•ด f(x)๋Š” non- decreasing์ด์—ฌ์•ผํ•ฉ๋‹ˆ๋‹ค.
(3) ๋นˆ๋„๊ฐ€ ๋งŽ์€ co-occurrence์— ๋„ˆ๋ฌด ๋งŽ์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์ง€ ์•Š๊ธฐ ์œ„ํ•ด f(x)๋Š” ํฐ x ๊ฐ’์— ๋Œ€ํ•ด์„œ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. (it, or ๊ฐ™์€ ๋‹จ์–ด๊ฐ€ ๋„ˆ๋ฌด ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ–๊ฒŒ ํ•˜์ง€ ์•Š๊ธฐ ์œ„ํ•ด)
ย 
๋”ฐ๋ผ์„œ ์ด ์กฐ๊ฑด๋“ค์„ ๋งŒ์กฑํ•˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐ€์ค‘์น˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
notion image
X_ij ๊ฐ’์ด ์ปค์ง€๋ฉด์„œ ๊ฐ€์ค‘์น˜๊ฐ€ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋นˆ๋„๊ฐ€ ๋†’์€ ๊ฒฝ์šฐ ์ง€๋‚˜์น˜๊ฒŒ ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์ง€ ์•Š๋„๋ก X_max๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•จ์ˆ˜๊ฐ’์˜ ์ตœ๋Œ€๊ฐ’์ด ์ •ํ•ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

Relationship to Skip-gram

๋‹จ์–ด ๋ฒกํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ๋ชจ๋ธ์€ ๊ถ๊ทน์ ์œผ๋กœ ์ฝ”ํผ์Šค์˜ ๋“ฑ์žฅ ํ†ต๊ณ„(occurrence statistics)์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋ธ ๊ฐ„์˜ ๊ณตํ†ต์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Word2Vec์˜ Skip-gram๊ณผ GloVe์˜ ๊ด€๊ณ„๋ฅผ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
ย 
์šฐ์„  i ๋‹จ์–ด ๋ฌธ๋งฅ์—์„œ ๋‹จ์–ด j๊ฐ€ ๋“ฑ์žฅํ•  ํ™•๋ฅ ์˜ ๋ชจ๋ธ์€ Softmax๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
์œˆ๋„์šฐ ์‚ฌ์ด์ฆˆ๋ฅผ ๋‘๊ณ  ์ „์ฒด ์ฝ”ํผ์Šค์— ๋Œ€ํ•ด ํ•™์Šตํ•  ๋•Œ ๋กœ๊ทธ ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋ ค๋Š” ์‹œ๋„์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ global objective function์€ ์Œ์˜ ๋กœ๊ทธ ์šฐ๋„ํ•จ์ˆ˜๋กœ (11)๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
(11)์—์„œ ๊ฐํ•ญ์— ๋Œ€ํ•ด ์†Œํ”„ํŠธ๋งฅ์Šค ์ •๊ทœํ™” ๊ณ„์ˆ˜๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์€ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค. ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ Skip-gram์€ ์— ๊ทผ์‚ฌํ•œ ๋ชจ๋ธ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ term์˜ ์ˆ˜๊ฐ€ ๋™์‹œ ๋“ฑ์žฅ ํ–‰๋ ฌ X์— ์ฃผ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ์‹(12)์—์„œ ์ฒ˜๋Ÿผ i์™€ j๊ฐ€ ๊ฐ™์€ ๊ฒฝ์šฐ๋ฅผ ๊ทธ๋ฃน์ง€์œผ๋ฉด ๋” ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. skip-gram์€ ์ฃผ๋ณ€ ๋‹จ์–ด์˜ ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š”๋ฐ, ๋™์‹œ ๋“ฑ์žฅ ํ–‰๋ ฌ์—์„œ ์ฃผ๋ณ€ ๋‹จ์–ด๊ฐ€ ์ฃผ์–ด์งˆ ํ™•๋ฅ  ์ฆ‰ i์™€ j๊ฐ€ ๊ฐ™์€ ๊ฒฝ์šฐ()๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€ ๋ฏธ๋ฆฌ ์—ฐ์‚ฐ๋˜์–ด ์žˆ์–ด ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋นจ๋ผ์กŒ์Šต๋‹ˆ๋‹ค.
์•ž์„œ ์ •์˜ํ•œ ์‹์— ๋”ฐ๋ผ = P_ij x X_i์ด๊ณ  H(P_i, Q_i)๋Š” P, Q์˜ Cross entropy์ž…๋‹ˆ๋‹ค. Cross entropy๋Š” distance๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
Cross entropy๋Š” ๊ผฌ๋ฆฌ๊ฐ€ ๊ธด ํ™•๋ฅ ๋ถ„ํฌ์˜ ๊ฒฝ์šฐ ์ž์ฃผ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์‚ฌ๊ฑด์— ๋Œ€ํ•ด ๋งค์šฐ ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ์–ด P์™€ Q์˜ ์ •๊ทœํ™” ๊ณ„์ˆ˜๋ฅผ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œ์ œ๊ณฑ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
ย 
์—ฌ๊ธฐ์„œ ๊ฐ€ ๋„ˆ๋ฌด ํฐ ๊ฐ’์„ ๊ฐ€์ ธ ์ตœ์ ํ™”ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์ƒ๊น๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ P์™€ Q์— ๋กœ๊ทธ๋ฅผ ์ทจํ•ด ์ œ๊ณฑ์˜ค์ฐจ์˜ ๊ฐ’์„ ์ค„์—ฌ์ค๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฐ€ ๋ฏธ๋ฆฌ ๊ฒฐ์ •๋œ ๊ฐ€์ค‘์น˜ ๊ฐ’์œผ๋กœ ์ตœ์ ํ™”๊ฐ€ ๋ณด์žฅ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. Mikolov๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•„ํ„ฐ๋งํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ์•„๋ƒˆ๊ณ  ๋ฌธ๋งฅ ๋‹จ์–ด์— ์˜์กดํ•˜์ง€ ์•Š๋Š” ๋” ์ผ๋ฐ˜์ ์ธ ๊ฐ€์ค‘ ํ•จ์ˆ˜๋ฅผ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ GloVe์˜ ์†์‹คํ•จ์ˆ˜์™€ ๊ฐ™์€ ํ˜•ํƒœ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ย 

Complexity of the model

๋ชจ๋ธ์˜ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์€ X ํ–‰๋ ฌ์—์„œ 0์ธ ์•„๋‹Œ ๊ฐ’(nonzero elements)์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ 0์ด ์•„๋‹Œ ๊ฐ’์˜ ํฌ๊ธฐ๋Š” ์ „์ฒด ์ฝ”ํผ์Šค ํฌ๊ธฐ๋ณด๋‹ค ํ•ญ์ƒ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ์ „์ฒด ์ฝ”ํผ์Šค ํฌ๊ธฐ์— ์˜์กดํ•˜๋Š” ์œˆ๋„์šฐ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์— ๋น„ํ•ด ํฐ ํ–ฅ์ƒ์ž…๋‹ˆ๋‹ค. Glove์˜ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
ย 

Experiments

Evaluation methods

1. Word analogy ๋‹จ์–ด ์œ ์ถ”

โ€œa๊ฐ€ b์ผ ๋•Œ, c๋Š” __ ์ด๋‹ค?โ€
์œ„์™€ ๊ฐ™์€ question์ด 19,544๊ฐœ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ณ , semantic๊ณผ syntatic์œผ๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.
semantic
: ์‚ฌ๋žŒ์ด๋‚˜ ์žฅ์†Œ์— ๊ด€๋ จ๋œ ์งˆ๋ฌธ
e.g. โ€œAthens is to Greece as Berlin is to ____?โ€
syntatic
: ๋™์‚ฌ ์‹œ์ œ๋‚˜ ํ˜•์šฉ์‚ฌ์˜ ํ˜•ํƒœ
e.g. โ€œdance is to dancing as fly is to ____?โ€
โ€œa is to b as c is to ___?โ€์— ๋Œ€ํ•œ ๋‹ต์œผ๋กœ d๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” cosine ์œ ์‚ฌ๋„์— ๋”ฐ๋ผ ๊ฐ’๊ณผ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ์˜ ๊ฐ’์„ ์ฐพ์Šต๋‹ˆ๋‹ค.
ย 

2. Word similarity ๋‹จ์–ด ์œ ์‚ฌ๋„

๋‹จ์–ด ์œ ์ถ”๊ฐ€ ์ฃผ๋œ task์ด๊ธด ํ•˜์ง€๋งŒ, ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๋‹จ์–ด ์œ ์‚ฌ๋„ task์—๋„ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. WordSim-353, MC, RG, SCWS, RW๋“ฑ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
ย 

3. Named entity recognition ๊ฐœ์ฒด๋ช… ์ธ์‹

NER์šฉ CoNLL-2003 ์˜๋ฌธ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹์€ ์‚ฌ๋žŒ, ์žฅ์†Œ, ์กฐ์ง, ๊ธฐํƒ€ ๋“ฑ 4๊ฐ€์ง€ ๊ฐ์ฒดํƒ€์ž…์— ๋Œ€ํ•ด ํ‘œ๊ธฐ๋œ ๋ฌธ์„œ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. CoNLL-03 ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ํ›ˆ๋ จ์‹œํ‚ค๊ณ , 3๊ฐ€์ง€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.
  1. CoNLL-03 ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ
  1. ACE Phase2(2001-02), ACE-2003 ๋ฐ์ดํ„ฐ
  1. MUC7 Formal Run ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ
BIO2 annotation standard๋ฅผ ๋”ฐ๋ฅด๊ณ , Wang and Manning์— ๊ธฐ์ˆ ๋œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.
437,905๊ฐœ์˜ ๋ณ„๋„ ํŠน์ง•์ด CoNLL-2003 ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜์—ˆ๊ณ , 5๋‹จ์–ด context ๋‚ด ๊ฐ ๋‹จ์–ด์— ๋Œ€ํ•ด 50์ฐจ์›์˜ ๋ฒกํ„ฐ๊ฐ€ ์ถ”๊ฐ€๋˜์–ด, ์ด๋Š” ์—ฐ์†ํŠน์ง•์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์ง•๋“ค์ด ์ž…๋ ฅ์œผ๋กœย Wang and Manning์˜ย  ๋ชจ๋ธ๊ณผ ๋™์ผํ•œ ์„ค์ •์œผ๋กœ CRF(Conditional Random Field)๋ฅผ ํ›ˆ๋ จํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ย 

Corpora and training details

๋‹ค์–‘ํ•œ ์‚ฌ์ด์ฆˆ๋ฅผ ๊ฐ€์ง„ 5๊ฐœ์˜ ๋ง๋ญ‰์น˜๋ฅผ ๋ชจ๋ธ์— ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  1. 10์–ต token์˜ 2010 Wikipedia
  1. 16์–ต token์˜ 2014 Wikipedia
  1. 43์–ต token์˜ Gigaword5
  1. Gigaword5 + Wikipedia2014
  1. 420์–ต token์˜ Common Crawl
ย 
Stanford tokenizer๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ง๋ญ‰์น˜๋ฅผ ํ† ํฐํ™”ํ•˜๊ณ  ์†Œ๋ฌธ์ž๋กœ ๋ฐ”๊ฟ”์ค€ ๋’ค ๊ฐ€์žฅ ์ž์ฃผ ๋“ฑ์žฅํ•œ 400,000๊ฐœ์— ๋Œ€ํ•œ voabulary๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋™์‹œ๋ฐœ์ƒํšŸ์ˆ˜ ํ–‰๋ ฌ X๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. X๋ฅผ ์ƒ์„ฑํ• ๋•Œ, context window์˜ ํฌ๊ธฐ์™€ ์™ผ์ชฝ context์™€ ์˜ค๋ฅธ์ชฝ context๋ฅผ ๊ตฌ๋ถ„ํ• ์ง€์— ๋Œ€ํ•œ ๊ฒƒ์„ ๊ฒฐ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ ๊ฐ์†Œํ•˜๋Š” ๊ฐ€์ค‘์น˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ d๋งŒํผ ๋–จ์–ด์ง„ ๋‹จ์–ด์Œ์€ ์ด ํšŸ์ˆ˜์˜ 1/d ๋งŒํผ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ ๋‹จ์–ด์Œ์€ ๋‹จ์–ด ๊ฐ„ ๊ด€๊ณ„์— ๋Œ€ํ•ด ๊ด€๋ จ์„ฑ์ด ๋‚ฎ์€ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋Š” ์ด์œ ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
ย 
๋ชจ๋“  ์‹คํ—˜์—์„œย  ๋กœ ์ •ํ•˜๊ณ , AdaGrad๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜์˜€์Šต๋‹ˆ๋‹ค.(X์˜ 0์ด ์•„๋‹Œ ์›์†Œ์— ๋Œ€ํ•ด stochasticํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋งํ•˜๊ณ , ์ดˆ๊ธฐ ํ•™์Šต๋ฅ ์€ 0.05๋กœ ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.) 300์ฐจ์›๋ณด๋‹ค ์ž‘์€ ๋ฒกํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” 50๋ฒˆ, ๋ฐ˜๋Œ€์˜ ๊ฒฝ์šฐ๋Š” 100๋ฒˆ ๋ฐ˜๋ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณ„๋‹ค๋ฅธ ์–ธ๊ธ‰์ด ์—†์„๋•, ์™ผ์ชฝ์—์„œ 10๊ฐœ์˜ ๋‹จ์–ด, ์˜ค๋ฅธ์ชฝ์—์„œ 10๊ฐœ์˜ ๋‹จ์–ด๋ฅผ context๋กœ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ย 
๋ชจ๋ธ์€ ๋‘ ๊ฐœ์˜ ๋‹จ์–ด๋ฒกํ„ฐย  ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. X๊ฐ€ ๋Œ€์นญํ–‰๋ ฌ์ด๋ฉด ย  ๋Š” ๋žœ๋ค์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๋ถ€๋ถ„๋งŒ ๋นผ๊ณ  ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒกํ„ฐ์˜ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
๋ฐ˜๋ฉด์— ํŠน์ • ์‹ ๊ฒฝ๋ง์˜ ๊ฒฝ์šฐ, ๋„คํŠธ์›Œํฌ์˜ ์—ฌ๋Ÿฌ ์ธ์Šคํ„ด์Šค๋ฅผ ํ›ˆ๋ จํ•œ ๋’ค ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜๋ฉด ๊ณผ์ ํ•ฉ๊ณผ noise๋ฅผ ์ค„์ด๊ณ  ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ๊ฒฐ๊ณผ๋ฅผ ๊ฐœ์„ ์‹œํ‚จ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š”ย  ๋ฅผ ๋‹จ์–ด๋ฒกํ„ฐ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ•ฉ์‚ฐํ•˜๊ธฐ๋กœ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒํ•˜๋ฉด ์„ฑ๋Šฅ์€ ์†Œํญ ํ–ฅ์ƒ๋˜๊ณ  semantic analogy task(์˜๋ฏธ์  ์œ ์ถ”) ๊ฒฐ๊ณผ์—์„œ ๊ฐ€์žฅ ํฌ๊ฒŒ ๊ฐœ์„ ๋œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
ย 
๋‹ค์–‘ํ•œ SOTA ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ์™€, Word2Vec๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งŒ๋“ค์–ด์ง„ ๊ฒฐ๊ณผ๋ฅผ SVD๋ฅผ ์ด์šฉํ•œ ๋‹ค์–‘ํ•œ ๋ฒ ์ด์Šค๋ผ์ธ๊ณผ ๋น„๊ตํ•ด๋ณด๊ธฐ๋กœ ํ•ฉ๋‹ˆ๋‹ค. word2vec์—์„œ๋Š”ย ์ƒ์œ„ 400,000๊ฐœ์˜ ์ตœ๋นˆ ๋‹จ์–ด๋“ค๊ณผ context window size๋Š” 10์œผ๋กœ ํ•ด์„œ skip_gram์™€ CBOW๋ชจ๋ธ๋กœ 60์–ต๊ฐœ์˜ ๋ง๋ญ‰์น˜๋ฅผ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค. SVD ๋ฒ ์ด์Šค๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋Š” 10,000๊ฐœ์˜ ์ตœ๋นˆ๋‹จ์–ด๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ ๋‹จ์–ด๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์ง€์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๋Š” ์ž˜๋ฆฐ ํ–‰๋ ฌย ์„ ์ƒ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ํ–‰๋ ฌ์˜ ๋‹จ์ผ ๋ฒกํ„ฐ๋Š” baseline SVD๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ SVD-S(), SVD-L(์˜ ๋‘ ๋ฒ ์ด์Šค๋ผ์ธ๋„ ํ‰๊ฐ€ํ•˜๋Š”๋ฐ, ๋‘ ๋ฐฉ๋ฒ• ๋ชจ๋‘ X๊ฐ’์˜ ๋ฒ”์œ„๋ฅผ ์••์ถ•์‹œํ‚ต๋‹ˆ๋‹ค.
ย 

Results

1. Analogy Task ๋‹จ์–ด ์œ ์ถ”

analogy task์— ๋Œ€ํ•ด percent accuracy๋กœ ํ‘œํ˜„ํ•œ table๋กœ, ๋ฐ‘์ค„์นœ ๊ฐ’์€ ๋น„์Šทํ•œ ์‚ฌ์ด์ฆˆ์˜ ๋ชจ๋ธ ๊ทธ๋ฃน์—์„œ ๊ฐ€์žฅ ์ข‹์€ ์ ์ˆ˜๋ฅผ, bold ๊ฐ’์€ ์ „์ฒด์—์„œ ๊ฐ€์žฅ ์ข‹์€ ์ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. Skip-gram , CBOW๋Š” word2vec์„ ์ด์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
๋” ์ž‘์€ vector size์™€ corpora์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๋‹ค๋ฅธ baseline ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด GloVe๊ฐ€ ํ›จ์”ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•ด๋‹น ๋ชจ๋ธ์„ 420์–ต token์˜ ํฐ ๋ง๋ญ‰์น˜๋„ ์‰ฝ๊ฒŒ ํ•™์Šตํ•˜์—ฌ ์‹ค์งˆ์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— ๋‹ค๋ฅธ ๋ชจ๋ธ์—์„œ๋Š” ๋ง๋ญ‰์น˜ ํฌ๊ธฐ๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š์•˜์Šต๋‹ˆ๋‹ค. (SVD-L์˜ ์ €ํ•˜๋œ ์„ฑ๋Šฅ์„ ํ†ตํ•ด์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.) ์ด๋Š” ์šฐ๋ฆฌ ๋ชจ๋ธ์—์„œ ์ œ์‹œํ•œ weighting schema์˜ ํ•„์š”์„ฑ์„ ๋”์šฑ ๊ฐ•๋ ฅํ•˜๊ฒŒ ๋‚ดํฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

2. Word similarity ๋‹จ์–ด ์œ ์‚ฌ๋„

5๊ฐœ์˜ ๋‹ค๋ฅธ ๋‹จ์–ด์œ ์‚ฌ๋„ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์Šคํ”ผ์–ด๋งŒ ์ˆœ์œ„ ์ƒ๊ด€์œผ๋กœ, ๋ชจ๋“  ๋ฒกํ„ฐ์˜ ์ฐจ์›์€ 300์ž…๋‹ˆ๋‹ค.
  • ์Šคํ”ผ์–ด๋งŒ ์ˆœ์œ„ ์ƒ๊ด€ : ๋‘ ๊ณ„๋Ÿ‰ํ˜• ๋ณ€์ˆ˜ ๋˜๋Š” ์ˆœ์„œํ˜• ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ๋‹จ์ˆœ ๊ด€๊ณ„๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์›์‹œ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹ˆ๋ผ ๊ฐ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์ˆœ์œ„๋ฅผ ๋งค๊ธด ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
notion image
์œ ์‚ฌ๋„ ์ ์ˆ˜๋Š” ๋จผ์ € ๊ฐ feature๋ฅผ vocabulary์— ๋Œ€ํ•˜์—ฌ normalizeํ•œ ํ›„ cosine ์œ ์‚ฌ๋„๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น ์ ์ˆ˜์™€ ์ธ๊ฐ„์˜ ๊ฒฐ์ • ์‚ฌ์ด์˜ ์Šคํ”ผ์–ด๋งŒ ์ˆœ์œ„ ์ƒ๊ด€ ๊ณ„์ˆ˜๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์˜ ๊ฒฝ์šฐ ๋ณด๋‹ค ์ž‘์€ size์˜ corpus๋ฅผ ์‚ฌ์šฉํ–ˆ์Œ์—๋„ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.
ย 

3. Named Entity Recognition ๊ฐœ์ฒด๋ช… ์ธ์‹

50์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ NER task์— ๋Œ€ํ•˜์—ฌ F1 score๋ฅผ ์ธก์ •ํ•œ ํ‘œ์ž…๋‹ˆ๋‹ค. Discrete์ด word vector๊ฐ€ ์—†๋Š” baseline์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ณต๊ฐœ์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ HPCA, HSMN, CW๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
GloVe ๋ชจ๋ธ์€ CoNLL test set์„ ์ œ์™ธํ•˜๊ณ  ๋ชจ๋“  evaluation metrics์— ๋Œ€ํ•˜์—ฌ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. CoNLL test set์€ HPCA ๋ฐฉ๋ฒ•์ด ์กฐ๊ธˆ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.
GloVe ๋ฒกํ„ฐ๋Š” downstream NLP task์— ์œ ์šฉํ•˜๋‹ค๊ณ  ๊ฒฐ๋ก  ๋‚ด๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • downstream task : ๊ตฌ์ฒด์ ์œผ๋กœ ํ’€๊ณ  ์‹ถ์€ ๋ฌธ์ œ
    • ์ตœ๊ทผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ถ„์•ผ์—์„œ๋Š” ์–ธ์–ด๋ชจ๋ธ์„ย pre-train ๋ฐฉ์‹์„ ์ด์šฉํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ณ , ๊ทธ ํ›„์— ์›ํ•˜๊ณ ์ž ํ•˜๋Š” ํƒœ์Šคํฌ๋ฅผย fine-tuning ๋ฐฉ์‹์„ ํ†ตํ•ด ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ด๋•Œ, ํƒœ์Šคํฌ๋ฅผ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
      ย 

Model Analysis

1. Vector Length and Context Size

vector size์™€ window size/ type์— ๋”ฐ๋ฅธ analogy task์˜ accuracy์ž…๋‹ˆ๋‹ค. 60์–ต token ๋ง๋ญ‰์น˜๋กœ ํ•™์Šตํ•˜์˜€๊ณ , (a)์—์„œ window size๋Š” 10, (b), (c)์—์„œ vector size๋Š” 100์ž…๋‹ˆ๋‹ค.
notion image
  • Symmetric : window size๊ฐ€ ์ขŒ์šฐ ์–‘์ชฝ์œผ๋กœ ํ™•์žฅ๋˜๋Š” ๊ฒƒ
  • Asymmetric : window size๊ฐ€ ์™ผ์ชฝ์œผ๋กœ๋งŒ ํ™•์žฅ๋˜๋Š” ๊ฒƒ
ย 
(a)์—์„œ๋Š” ์•ฝ 200์ฐจ์› ์ด์ƒ์ด ๋˜๋ฉด ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
(b)์™€ (c)๋ฅผ ํ†ตํ•ด, syntatic task์— ๋Œ€ํ•ด์„œ๋Š” ์ž‘๊ณ  asymmetricํ•œ context window๊ฐ€ ์ ํ•ฉํ•œ๋ฐ, ์ด๋Š” syntatic information์ด ์ฆ‰๊ฐ์ ์ธ context๋ฅผ ํ†ตํ•ด ์–ป์–ด์ง€๊ณ , ๋‹จ์–ด์˜ ์ˆœ์„œ์— ๊ฐ•ํ•˜๊ฒŒ ์˜์กดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ semantic information์€ ๋” ์ž์ฃผ ์ง€์—ญ์ ์ด์ง€ ์•Š๊ณ , ๋” ํฐ window size์—์„œ ํฌ์ฐฉ๋ฉ๋‹ˆ๋‹ค.
ย 

2. Corpus Size

๋‹ค๋ฅธ corpora์— ๋Œ€ํ•˜์—ฌ ํ•™์Šต๋œ 300์ฐจ์›์˜ ๋ฒกํ„ฐ๋“ค์„ ์ด์šฉํ•˜์—ฌ analogy task์— ๋Œ€ํ•œ accuracy๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
syntatic task์˜ ๊ฒฝ์šฐ, ๋ง๋ญ‰์น˜์˜ ํฌ๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ๊ฐ™์ด ๋‹จ์กฐ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋” ํฐ ๋ง๋ญ‰์น˜๊ฐ€ ์ „ํ˜•์ ์œผ๋กœ ๋” ์ข‹์€ ํ†ต๊ณ„๋ฅผ ๋ณด์ธ๋‹ค๊ณ  ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
semantic task์˜ ๊ฒฝ์šฐ syntatic์˜ trend์™€๋Š” ๋‹ค๋ฅด๊ฒŒ, ํฐ ์‚ฌ์ด์ฆˆ์˜ Gigaword ๋ง๋ญ‰์น˜๋ณด๋‹ค Wikipedia ๋ง๋ญ‰์น˜๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์œ ์ถ” dataset์—์„œ ์œ ์ถ”ํ•ด์•ผ ํ•  ๋„์‹œ์™€ ๋‚˜๋ผ ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ ๋งŽ๊ณ , Wikipedia๋Š” ๋Œ€์ฒด์ ์œผ๋กœ ํฌ๊ด„์ ์ธ ๊ธฐ์‚ฌ๋“ค์„ ๋งŽ์ด ๊ฐ€์ง€๊ณ  ์žˆ์–ด์„œ ๊ทธ๋Ÿฐ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ Wikipedia๋Š” ์ƒˆ๋กœ์šด ์ง€์‹๋“ค๋กœ ์—…๋ฐ์ดํŠธ ๋˜๋Š”๋ฐ์— ๋ฐ˜ํ•ด Gigaword๋Š” ๊ธฐ๊ฐ„์ด ์ง€๋‚œ news๋“ค์„ ๊ณ ์ • ์ €์žฅํ•˜๊ณ  ์•„๋งˆ๋„ ๋ถ€์ •ํ™•ํ•œ ์ •๋ณด๋„ ๋‹ด๊ณ  ์žˆ๊ธฐ์— ์œ„์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

3. Run-time

์ „์ฒด run-time์€ X๋ฅผ ์ฑ„์šฐ๊ณ  ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์œผ๋กœ ๋ถ„๋ฆฌ๋ฉ๋‹ˆ๋‹ค. X๋ฅผ ์ฑ„์šฐ๋Š”๋ฐ์—๋Š” window size, vocabulary size, corpus size ๋“ฑ ์—ฌ๋Ÿฌ ์š”์ธ์ด ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ํ•ด๋‹น ๋‹จ๊ณ„๋Š” ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ X๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š”๋ฐ์—๋Š” vector size์™€ iteration ์ˆ˜์— ์˜ํ–ฅ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.
dual 2.1GHz Intel Xeon E5-2658 machine์œผ๋กœ single thread๋ฅผ ์ด์šฉํ•˜๊ณ , window size๋Š” ์–‘์ชฝ์œผ๋กœ 10, 400,000๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ํฌํ•จํ•˜๋Š” vocabulary, 60์–ต๊ฐœ token์˜ ๋ง๋ญ‰์น˜๋ฅผ ์ด์šฉํ–ˆ์„ ๋•Œ X๋ฅผ ์ฑ„์šฐ๋Š”๋ฐ์—๋Š” 85๋ถ„, 300์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š”๋ฐ์—๋Š” 1 iteration์— 14๋ถ„์ด ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.
ย 

4. Comparison with word2vec

Glove์™€ word2vec์„ ๋น„๊ตํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ์ค‘์š”ํ•˜๊ฒŒ control ๋˜๋Š” ์š”์†Œ๊ฐ€ Training Time์ž…๋‹ˆ๋‹ค.
Glove๋Š” iteration ์ˆ˜, CBOW(Continuous Bag-of-words)์™€ Skip-Gram์€ negative sample ์ˆ˜์— ์˜ํ–ฅ์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. 300์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ๋™์ผํ•œ 60์–ต token ๋ง๋ญ‰์น˜์— ๋Œ€ํ•˜์—ฌ ๋™์ผํ•œ 400,000 word vocabulary๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ , context window size๋Š” ์–‘์ชฝ์œผ๋กœ 10์œผ๋กœ ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
ํ•˜๋‹จ์˜ x์ถ•์€ ๊ฐ๊ฐ GloVe์— ๋Œ€ํ•ด์„œ๋Š” iteration, CBOW, Skip-Gram์— ๋Œ€ํ•ด์„œ๋Š” negative sample ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
word2vec์˜ ๊ฒฝ์šฐ negative sample์ˆ˜๊ฐ€ 10์„ ๋„˜์–ด๊ฐ€๋ฉด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋งˆ๋„ negative sampling ๋ฐฉ๋ฒ•์ด target ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์ž˜ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ๋ด…๋‹ˆ๋‹ค.
ย 

Conclusion

count-based method์™€ prediction-based method ๋ชจ๋‘ ๋ง๋ญ‰์น˜์˜ ๋‚ด์žฌ๋œ ๋™์‹œ๋ฐœ์ƒ ํ†ต๊ณ„๊ฐ’์„ ์ฆ๋ช…ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ทผ๋ณธ์ ์œผ๋กœ ๋“œ๋ผ๋งˆํ‹ฑํ•˜๊ฒŒ ๋‹ค๋ฅด์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ „์ฒด์ ์ธ ํ†ต๊ณ„๊ฐ’์„ ์žก์•„๋‚ด๋Š” count-based method๋Š” ์ข€ ๋” ์œ ๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” count data์˜ ์ด์ ์„ ํ™œ์šฉํ•˜๋ฉด์„œ๋„ ๋™์‹œ์— word2vec ์ฒ˜๋Ÿผ ์ตœ๊ทผ์˜ log-bilinear prediction based method์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์˜๋ฏธ์žˆ๋Š” ์„ ํ˜• ๊ตฌ์กฐ๋ฅผ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ๋กœ GloVe๋Š” word representation์— ์žˆ์–ด์„œ ๋‹จ์–ด ์œ ์ถ”, ๋‹จ์–ด ์œ ์‚ฌ๋„, ๊ฐ์ฒด๋ช… ์ธ์‹ task์— ๋Œ€ํ•ด ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๋น„์ง€๋„ ํ•™์Šต์˜ ์ƒˆ๋กœ์šด log-bilinear ํšŒ๊ท€ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
ย 

Reference

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014.ย GloVe: Global Vectors for Word Representation.
ย 

์ด์ „ ๊ธ€ ์ฝ๊ธฐ

๋‹ค์Œ ๊ธ€ ์ฝ๊ธฐ

ย