Efficient Estimation Of Word Representations In Vector Space (Word2Vec) (2)
๐Ÿ“œ

Efficient Estimation Of Word Representations In Vector Space (Word2Vec) (2)

Created
Feb 2, 2022
Editor
Tags
NLP
cleanUrl: "/paper/word2vec-code"
๐Ÿ“„
๋…ผ๋ฌธ : Efficient Estimation Of Word Representations In Vector Space (Word2Vec) ์ €์ž : Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean

To begin

์˜ค๋Š˜์˜ ํฌ์ŠคํŠธ๋ฅผ ์ž‘์„ฑํ•˜๊ธฐ์— ์•ž์„œ, ์ด์ „ ํฌ์ŠคํŠธ๋ฅผ ์š”์•ฝํ•˜๋ฉฐ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ด์ „ ํฌ์ŠคํŠธ
๐Ÿ“œ
Efficient Estimation Of Word Representations In Vector Space (Word2Vec) (1)
๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”!
ย 
๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐํ™”ํ•  ๋•Œ ๋ถ„์‚ฐ ํ‘œํ˜„(distributed representation)์„ ํ†ตํ•ด ๋‹จ์–ด ๊ฐ„ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, multiple degrees of similarity๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค. vector(โ€œ์„œ์šธโ€) - vector(โ€œ์ˆ˜๋„โ€) + vector(โ€œ์ผ๋ณธโ€) ์˜ ๊ฒฐ๊ณผ๋กœ ์–ป์€ ๋ฒกํ„ฐ์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ€ โ€œ๋„์ฟ„โ€๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ฐฉ์‹์œผ๋กœ, ๋‹จ์–ด ๋ฒกํ„ฐ์˜ ์—ฐ์‚ฐ์ด ๋‹จ์–ด ์˜๋ฏธ์˜ ์—ฐ์‚ฐ์œผ๋กœ ๊ฐ€๋Šฅํ•˜๋„๋ก ์ƒˆ๋กœ์šด ๋ชจ๋ธ ์•„ํ‚คํ…์ณ๋ฅผ ๊ตฌ์„ฑํ•˜์—ฌ syntactic, semantic ์˜์—ญ์—์„œ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ย 
word2vec ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜๋Š”๋ฐ์— ์žˆ์–ด์„œ ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ ๋ฐ ํ•™์Šต ์†๋„์˜ ๊ฐœ์„ ์„ ์œ„ํ•ด CBOW(Continuous Bag-of-words)์™€ Skip-gram 2๊ฐ€์ง€์˜ architecture๋ฅผ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.
CBOW(Continuous Bag-of-words) : ์ฃผ๋ณ€(๋งฅ๋ฝ)์˜ ๋‹จ์–ด๋“ค๋กœ ์ค‘๊ฐ„(์ค‘์‹ฌ)์˜ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธก
notion image
ย 
  • input : ์˜ˆ์ธก์— ์ด์šฉํ•  2n๊ฐœ์˜ ์ฃผ๋ณ€๋‹จ์–ด 1-of-V ๋ฒกํ„ฐ
  • output label : ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ์ค‘๊ฐ„๋‹จ์–ด์˜ 1-of-V ๋ฒกํ„ฐ
  • training complexity Q = N x D + D x log(V)
    • N : ์ด์ „ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜
      D : vector์˜ ์ฐจ์›
      V : Vocabulary ๋‚ด ์ „์ฒด ๋‹จ์–ด ์ˆ˜
Skip-gram : ์ค‘๊ฐ„์˜ ๋‹จ์–ด๋กœ ์ฃผ๋ณ€์˜ ๋‹จ์–ด๋“ค์„ ์˜ˆ์ธก
notion image
  • input : ์˜ˆ์ธก์— ์ด์šฉํ•  ์ค‘๊ฐ„๋‹จ์–ด์˜ 1-of-V ๋ฒกํ„ฐ
  • output label : ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” 2n๊ฐœ์˜ ์ฃผ๋ณ€๋‹จ์–ด 1-of-V ๋ฒกํ„ฐ
  • training complexity Q = C x (D + D x log(V))
    • C : ๋‹จ์–ด์˜ ์ตœ๋Œ€ ๊ฑฐ๋ฆฌ
      D : vector์˜ ์ฐจ์›
      V : Vocabulary ๋‚ด ์ „์ฒด ๋‹จ์–ด ์ˆ˜
ย 

CBOW ๋ชจ๋ธ ๊ตฌํ˜„

์•„๋ž˜ ๊ธ€๊ณผ ์ฝ”๋“œ๋Š” Python ์–ธ์–ด์™€ PyTorch๋ฅผ ์‚ฌ์šฉํ•˜์—ฌย CBOW(Continuous Bag-Of-Word)ย ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ฐพ๊ณ ์ž ํ•˜๋Š” ๋‹จ์–ด ์œ„์น˜์˜ ์ฃผ๋ณ€์˜ ๋‹จ์–ด๋“ค์„ ๋ณด๊ณ  ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ๊ฐ€์žฅ ์ ํ•ฉํ•  ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ, ์ฆ‰ ์—ฌ๋Ÿฌ ๊ฐœ์˜ Input Layer๋ฅผ ๋ฐ›์•„๋“ค์—ฌ Hidden Layer๋ฅผ ๊ฑฐ์ณ ํ•˜๋‚˜์˜ Output Layer๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ฒƒ์ด CBOW ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ์ฃผ๋ณ€ ๋‹จ์–ด๋“ค์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ์ค‘์‹ฌ ๋‹จ์–ด๊ฐ€ ๋“ฑ์žฅํ•  ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฉฐ ๊ฐ ๋ ˆ์ด์–ด ์‚ฌ์ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

CBOW ์ฝ”๋“œ ๊ตฌํ˜„

ํ•™์Šต์„ ์œ„ํ•œ Corpus ์ „์ฒ˜๋ฆฌ
๊ธฐ์กด์˜ ์ฝ”๋“œ์— ํฌํ•จ๋˜์–ด์žˆ๋˜ ์งง์€ ๋ฌธ์žฅ ๋Œ€์‹ , ted ๊ฐ•์—ฐ์˜ ๋ฌธ์žฅ๋“ค์„ ํฌํ•จํ•˜๋Š” ์˜์–ด Corpus๋ฅผ ๊ฐ€์ ธ์™€ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น Corpus์— ๋Œ€ํ•ด ์ „์ฒ˜๋ฆฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€์„ ๋•Œ token 4,475,758๊ฐœ๋ฅผ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ์—ˆ๊ณ , ์ปดํ“จํ„ฐ์˜ ์‹œ๊ฐ„๊ณผ ์„ฑ๋Šฅ์„ ๊ณ ๋ คํ•˜์—ฌ ์ผ๋ถ€ ๋‹จ์–ด๋งŒ์„ ์ด์šฉํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
import nltk nltk.download('punkt') import re import urllib.request import zipfile from lxml import etree from nltk.tokenize import word_tokenize, sent_tokenize urllib.request.urlretrieve("https://raw.githubusercontent.com/ukairia777/tensorflow-nlp-tutorial/main/09.%20Word%20Embedding/dataset/ted_en-20160408.xml", filename="ted_en-20160408.xml") targetXML = open('ted_en-20160408.xml', 'r', encoding='UTF8') target_text = etree.parse(targetXML) parse_text = '\n'.join(target_text.xpath('//content/text()')) context_text = re.sub(r'\([^)]*\)', '', parse_text) sent_text = sent_tokenize(context_text) result = list() normalized_text = [] for string in sent_text: tokens = re.sub(r"[^a-z0-9]+", " ", string.lower()) normalized_text.append(tokens) for sentence in normalized_text: tokenized_text = word_tokenize(sentence) for tokenized_word in tokenized_text: result.append(tokenized_word) # result์— tokenized words๊ฐ€ ๋‹ด๊ฒจ์žˆ์Œ
CBOW ๋ชจ๋ธ ๊ตฌํ˜„์„ ์œ„ํ•ด torch ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
import torch from torch import nn from torch.autograd import Variable from torch.optim import SGD import torch.nn.functional as F

CBOW Class

CONTEXT_SIZE = 4 # ์ค‘์‹ฌ ๋‹จ์–ด ์–‘ ์˜†์œผ๋กœ 4๊ฐœ์”ฉ, ์ด 8๊ฐœ์˜ ๋‹จ์–ด๋ฅผ input์œผ๋กœ ๋ฐ›๋Š”๋‹ค EMBEDDING_DIM = 300 # ๋‹จ์–ด ๋ฒกํ„ฐ๋ฅผ 300 dimension์˜ vector๋กœ ๊ตฌ์„ฑํ•œ๋‹ค EPOCH = 20 # ๋‹จ์–ด set์— ๋Œ€ํ•œ ๋ฐ˜๋ณต ํ•™์Šต ํšŸ์ˆ˜ class CBOW(nn.Module): def __init__(self, vocab_size, embedding_size, context_size): super(CBOW, self).__init__() #๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” self.vocab_size = vocab_size self.embedding_size = embedding_size self.context_size = context_size self.embeddings = nn.Embedding(self.vocab_size, self.embedding_size) # ๊ณ„์ธต ์ƒ์„ฑ # input layer -> proj layer(lin1) self.lin1 = nn.Linear(self.context_size * 2 * self.embedding_size, 512) # proj layer -> output layer(lin2) self.lin2 = nn.Linear(512, self.vocab_size) def forward(self, inp): out = self.embeddings(inp).view(1, -1) out = self.lin1(out) # proj layer์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ reLU๋ฅผ ์‚ฌ์šฉ out = F.relu(out) out = self.lin2(out) # output layer์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ log_softmax๋ฅผ ์‚ฌ์šฉ out = F.log_softmax(out, dim=1) return out def get_word_vector(self, word_idx): word = Variable(torch.LongTensor([word_idx])) return self.embeddings(word).view(1, -1)
  • __init__ : CBOW ์„ ์–ธ ๋ฐ ์ดˆ๊ธฐํ™”
    • vocab_size : ์–ดํœ˜ ์ˆ˜
    • embedding_size : ๋‹จ์–ด ๋ฒกํ„ฐ์˜ ์ฐจ์› ์ˆ˜
    • context_size : input layer๋กœ ๋ฐ›์•„๋“ค์ผ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜ (ํ˜„์žฌ ์ž‘์„ฑํ•œ ์ฝ”๋“œ์—์„œ๋Š” ์ค‘์‹ฌ ๋‹จ์–ด ๊ธฐ์ค€ ์–‘ ์˜†์œผ๋กœ 2 x context_size ๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค)
  • forward : ํ•™์Šต ๋ฐฉ๋ฒ• ์ •์˜
    • inp : ์ฃผ๋ณ€ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ (์ค‘์‹ฌ ๋‹จ์–ด ์ฃผ๋ณ€ ๋‹จ์–ด์˜ word vector list๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค)
  • get_word_vector : ๋‹จ์–ด์˜ ๋ฒกํ„ฐ ๋ฐ˜ํ™˜
    • word_idx : ์ฐพ๊ณ ์ž ํ•˜๋Š” ๋‹จ์–ด์˜ ์ธ๋ฑ์Šค
    • ย 

ํ•™์Šต

CORPUS_SIZE = 10000 # Corpus size ์กฐ์ •์œผ๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ ์‚ฌ์ด์ฆˆ ๊ฒฐ์ • def main(): corpus_text = result[0:CORPUS_SIZE] # Corpus Data ๊ฐ€๊ณต data = list() for i in range(CONTEXT_SIZE, CORPUS_SIZE - CONTEXT_SIZE): data_context = list() for j in range(CONTEXT_SIZE): data_context.append(corpus_text[i - CONTEXT_SIZE + j]) for j in range(1, CONTEXT_SIZE + 1): data_context.append(corpus_text[i + j]) data_target = corpus_text[i] data.append((data_context, data_target)) global unique_vocab unique_vocab = list(set(corpus_text)) # mapping to index global word_to_idx word_to_idx = {w: i for i, w in enumerate(unique_vocab)} # CBOW ๋ชจ๋ธ ํ•™์Šต (์ •์˜๋Š” ์•„๋ž˜์—) global cbow cbow = train_cbow(data, unique_vocab, word_to_idx) if __name__ == "__main__": main()
CORPUS_SIZE๋Š” ํ•™์Šต์‹œํ‚ค๊ณ ์ž ํ•˜๋Š” ๋ง๋ญ‰์น˜์˜ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์‹œ๊ฐ„๊ณผ ์ปดํ“จํ„ฐ์˜ ์„ฑ๋Šฅ ๋“ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์‚ฝ์ž…ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ted ๊ฐ•์—ฐ์˜ ๋ฐ์ดํ„ฐ์—์„œ ํ† ํฐํ™” ์ฒ˜๋ฆฌํ•œ ๋ฐ์ดํ„ฐ์ธ result๋ฅผ ๊ฐ€๊ณตํ•˜์—ฌ, ํ•™์Šต์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค. ์ค‘์‹ฌ ๋‹จ์–ด(data_target)๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์™ผ์ชฝ 4๋‹จ์–ด, ์˜ค๋ฅธ์ชฝ 4๋‹จ์–ด๋ฅผ data_context์— ์‚ฝ์ž…ํ•˜๊ณ , ๋‘ ๊ฐ’๋“ค์„ ํŠœํ”Œ๋กœ์จ data ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
[์˜ˆ์‹œ] ์ฃผ์–ด์ง„ ๋‹จ์–ด ๋‚˜์—ด : here are two reasons "companies" fail they only do ์ค‘์‹ฌ ๋‹จ์–ด : companies ์ฃผ๋ณ€ ๋‹จ์–ด : here are two reasons fail they only do -> (['here','are','two','reasons','fail','they','only','do'],'companies') ([์ฃผ๋ณ€ 8๊ฐœ ๋‹จ์–ด] + ์ค‘์‹ฌ ๋‹จ์–ด) ํ•œ ๋ฌถ์Œ์œผ๋กœ data์— append ๋œ๋‹ค.
์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ data ๋ฆฌ์ŠคํŠธ์˜ ์›์†Œ๋Š” ๊ฐ๊ฐ 8๊ฐœ์˜ input word + ๋ชฉํ‘œ word 1๊ฐœ๋กœ ๊ตฌ์„ฑ๋œ ํŠœํ”Œ์ด๋ฉฐ, ํ•™์Šต๊ณผ์ •์—์„œ 8๊ฐœ์˜ ๋‹จ์–ด์™€ ๋ชฉํ‘œ ๋‹จ์–ด ๊ฐ„์˜ ๊ด€๊ณ„(๊ฐ€์ค‘์น˜)๋ฅผ ์ฐพ๋Š”๋ฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
ย 
unique_vocab์€ corpus์— ๋“ฑ์žฅํ•œ ๋‹จ์–ด๋ฅผ ์ค‘๋ณต๋˜์ง€ ์•Š๊ฒŒ ๋ชจ์•„๋‘” ๋‹จ์–ด์˜ list ์ž…๋‹ˆ๋‹ค. word_to_idx๋Š” unique_vocab์˜ ๋‹จ์–ด๋ฅผ key๋กœ ์‚ฌ์šฉํ•˜๊ณ , index ๋ฒˆํ˜ธ๋ฅผ value๋กœ ์‚ฌ์šฉํ•˜๋Š” dictionary ์ž…๋‹ˆ๋‹ค. data ๋ฆฌ์ŠคํŠธ์˜ ๋‹จ์–ด๋ฅผ dictionary์—์„œ ์ฐพ์•„ index ๋ฒˆํ˜ธ๋ฅผ ๋ฐ›๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด
ย 

train_cbow - ํ•™์Šต ๊ณผ์ •

CBOW ๋ชจ๋ธ์˜ ํ•™์Šต ๊ณผ์ •์— ๋Œ€ํ•œ ์ •์˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
def train_cbow(data, unique_vocab, word_to_idx): # CBOW ๋ชจ๋ธ ์„ ์–ธ cbow = CBOW(len(unique_vocab), EMBEDDING_DIM, CONTEXT_SIZE) # Loss function, Optimizer nll_loss = nn.NLLLoss() optimizer = SGD(cbow.parameters(), lr=0.01) # EPOCH : ํ˜„์žฌ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ํ•™์Šต ํšŸ์ˆ˜ for epoch in range(EPOCH): for context, target in data: inp_var = Variable(torch.LongTensor([word_to_idx[word] for word in context])) target_var = Variable(torch.LongTensor([word_to_idx[target]])) cbow.zero_grad() log_prob = cbow(inp_var) loss = nll_loss(log_prob, target_var) loss.backward() optimizer.step() return cbow
์šฐ์„ , CBOW ๋ชจ๋ธ๊ณผ ์†์‹คํ•จ์ˆ˜, ์ตœ์ ํ™”๋ฐฉ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • loss function - Negative Log Likelihood Loss. NLL Loss์—์„œ๋Š” ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ํ™•๋ฅ  ๊ฐ’์„ ์ง์ ‘ ๋ฐ˜์˜ํ•˜์—ฌ ์†์‹ค์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ํ™•๋ฅ  ๊ฐ’์„ ์Œ์˜ ๋กœ๊ทธ ํ•จ์ˆ˜๋กœ ๋ณ€ํ™˜์‹œ์ผœ ์ด์šฉํ•˜๋ฉด, ์ •๋‹ต์— ๊ฐ€๊นŒ์šธ ํ™•๋ฅ ์ด ๋‚ฎ์„์ˆ˜๋ก ๋” ๋งŽ์€ ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋” ์ •๊ตํ•œ ํ‰๊ฐ€๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. (์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ•˜์—ฌ ์ •๋‹ต์„ ๋งž์ถœ ํ™•๋ฅ ์ด 99%์ธ ํ•™์ƒ๊ณผ ์ฐ์–ด์„œ ๋งž์ถœ ํ™•๋ฅ ์ด 20%์ธ ํ•™์ƒ์ด ์žˆ๋‹ค๊ณ  ํ•˜๋ฉด, ๋‘˜ ๋‹ค ๋™์ผํ•œ ์ •๋‹ต์„ ๋‚ด๋ฆฌ๋”๋ผ๋„, ๋งž์ถœ ํ™•๋ฅ ์„ ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•˜์—ฌ ํ•™์ƒ ๊ฐ„์˜ ์ •๊ตํ•œ ํ‰๊ฐ€๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. )
์ด๋ฏธ์ง€ ์ถœ์ฒ˜ : https://wiznxt.tistory.com/783
์ด๋ฏธ์ง€ ์ถœ์ฒ˜ : https://wiznxt.tistory.com/783
  • Optimizer - Stochastic Gradient Descent ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋‹จ์œ„์ธ batch์˜ ํฌ๊ธฐ๋ฅผ 1๋กœ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์œผ๋กœ, ์—ฐ์‚ฐํ•  ๋ฐ์ดํ„ฐ์…‹์˜ ๊ทœ๋ชจ๊ฐ€ ํฐ ๊ฒฝ์šฐ์— ๋น„๊ต์  ๋น ๋ฅด๊ฒŒ ์ตœ์  ๊ฐ’์„ ์ฐพ๊ธฐ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ์ตœ์ ํ™” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. Gradient Descent ๋“ฑ์˜ ๋‹ค๋ฅธ optimizer ๋ณด๋‹ค ๋…ธ์ด์ฆˆ๊ฐ€ ํฌ๊ฒŒ ๋ฐœ์ƒํ•˜์ง€๋งŒ, ๊ณ„์‚ฐ๋Ÿ‰์„ ์ค„์—ฌ ๋น ๋ฅด๊ฒŒ ์ตœ์ ํ™”๋œ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€ ์ถœ์ฒ˜ : http://pages.cs.wisc.edu/~spehlmann/cs760/_site//assets/SGDvGD.svg
์ด๋ฏธ์ง€ ์ถœ์ฒ˜ : http://pages.cs.wisc.edu/~spehlmann/cs760/_site//assets/SGDvGD.svg
์•„๋ž˜ ๋‚˜์—ด๋œ ์ผ๋ จ์˜ ์ˆœ์„œ๋กœ CBOW ๋ชจ๋ธ์ด ํ•™์Šต์„ ์‹ค์‹œํ•ฉ๋‹ˆ๋‹ค.
inp_var = Variable(torch.LongTensor([word_to_idx[word] for word in context])) target_var = Variable(torch.LongTensor([word_to_idx[target]])) cbow.zero_grad() log_prob = cbow(inp_var) loss = nll_loss(log_prob, target_var) loss.backward() optimizer.step()
  1. data ๋ฆฌ์ŠคํŠธ์—์„œ ํŠœํ”Œ(8๊ฐœ์˜ ์ฃผ๋ณ€ ๋‹จ์–ด์™€ 1๊ฐœ์˜ ๋ชฉํ‘œ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ)์„ ๊ฐ€์ ธ์™€ inp_var, target_var์„ ์ดˆ๊ธฐํ™”
  1. ํ˜„์žฌ ๋ชจ๋ธ์˜ ๋ณ€ํ™”๋„๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”
  1. ๋ชจ๋ธ์˜ input layer๋กœ 8๊ฐœ์˜ ์ฃผ๋ณ€ ๋‹จ์–ด๋ฅผ ์ „๋‹ฌ
  1. ๋ชจ๋ธ์ด ๋ชฉํ‘œ ๋‹จ์–ด๋ฅผ ๋งž์ถœ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๊ณ , ์†์‹คํ•จ์ˆ˜๊ฐ€ ํ‰๊ฐ€๋ฅผ ๋‚ด๋ฆผ
  1. ์—ญ์ „ํŒŒ๋ฅผ ์ˆ˜ํ–‰ (Autograd๊ฐ€ attribution์— ๋ณ€ํ™”๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ €์žฅํ•จ)
  1. ์ €์žฅ๋œ ๋ณ€ํ™”๋„์— ๋”ฐ๋ผ Optimizer๊ฐ€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์กฐ์ •ํ•จ
  1. data ๋ฆฌ์ŠคํŠธ์˜ ๋ชจ๋“  ํŠœํ”Œ์— ๋Œ€ํ•ด 1~6์„ ๋ฐ˜๋ณต
  1. EPOCH ์˜ ํฌ๊ธฐ๋งŒํผ 1~7์„ ๋ฐ˜๋ณต
ย 
์ฃผ์–ด์ง„ EPOCH ๋งŒํผ ํ•™์Šต์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋™์•ˆ, ๊ฐ EPOCH ๋‹จ๊ณ„์— ๋Œ€ํ•œ ์†์‹คํ•จ์ˆ˜์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•ด๋ณธ ๊ฒฐ๊ณผ, ํ•™์Šต์„ ํ†ตํ•ด ๋‹จ์–ด๋ฒกํ„ฐ๋“ค์ด ์ ์ ˆํ•œ ์œ„์น˜์— ๋ฐฐ์น˜๋˜๋ฉด์„œ, ํ•™์Šต์ด ์ง„ํ–‰๋ ์ˆ˜๋ก ํ‰๊ท  ์†์‹ค์ด ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์˜ค๋ฅธ์ชฝ ์‚ฌ์ง„์€ CBOW ๋ชจ๋ธ์˜ ํ•™์Šต ํšŸ์ˆ˜์™€ ํ‰๊ท  ์†์‹ค๋ฅ ์„ ์ถœ๋ ฅํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
notion image

ํ•™์Šต ๊ฒฐ๊ณผ

ํ•™์Šต์ด ์™„๋ฃŒ๋˜๋ฉด, ์•„๋ž˜ ์˜ˆ์‹œ์™€ ๊ฐ™์ด corpus์— ์†ํ•œ ๋‹จ์–ด๋“ค์€ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ–๊ฒŒ๋˜๊ณ , ์ด๊ฒƒ์ด ํŠน์ •ํ•œ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
dimension์„ 300์œผ๋กœ ์ฃผ์—ˆ์œผ๋ฏ€๋กœ, ๊ฐ๊ฐ์˜ ๋‹จ์–ด ๋ฒกํ„ฐ๋Š” 300๊ฐœ์˜ ๊ฐ€์ค‘์น˜์— ์˜ํ•ด ๊ทธ ์˜๋ฏธ๊ฐ€ ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค. between [-0.4627, 0.5517, 0.9427, -1.2970, 0.4817, -0.5149, -2.0008, -1.6774, -0.1271, -0.1935, -0.7941, 0.6387, 0.7139, 1.1319, 1.1475, 0.9229, ... ์ค‘๋žต (300๊ฐœ์˜ ๊ฐ€์ค‘์น˜) ... 0.1773, -0.2089, 1.7897, -1.3952, 0.8338, -0.9995, 0.5707, 0.3273, 1.4372, 0.7663, 0.2333, 0.1943]

๋‹จ์–ด ๊ฐ„ ์œ ์‚ฌ๋„ ์—ฐ์‚ฐ

ํ•™์Šต ๊ฒฐ๊ณผ ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๊ธฐ๋Šฅ์„ ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
(1) get_similarity : ์ž…๋ ฅํ•œ ๋‘ ๋‹จ์–ด๊ฐ„์˜ ์œ ์‚ฌ๋„ ์ถœ๋ ฅ
(2) get_most_similar_word : ํ˜„์žฌ corpus์—์„œ ์ž…๋ ฅํ•œ ๋‹จ์–ด์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋‹จ์–ด ์ถœ๋ ฅ
ํ•ด๋‹น ๊ธฐ๋Šฅ ๊ตฌํ˜„ ์†Œ์Šค์ฝ”๋“œ
def similarity_between_words(word_1_vec, word_2_vec): # ๋‘ ๋ฒกํ„ฐ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด, ์ฝ”์‚ฌ์ธ์„ ์ด์šฉํ•จ # ๋‘ ๋ฒกํ„ฐ์˜ ์‚ฌ์ด๊ฐ์˜ ์ž‘์„์ˆ˜๋ก (์œ ์‚ฌํ• ์ˆ˜๋ก) cosine ๊ฐ’์ด 1์— ๊ฐ€๊นŒ์›Œ์ง์„ ์ด์šฉ result = (torch.dot(word_1_vec, word_2_vec) / (torch.norm(word_1_vec) * torch.norm(word_2_vec))) # ์–‘์ˆ˜ ๊ฐ’ ๋ฐ˜ํ™˜์„ ์œ„ํ•ด ๊ฐ’ ๋ณด์ • if(result < 0): result *= -1 return result def get_similarity(cbow, word1, word2, unique_vocab, word_to_idx): if not((word1 in unique_vocab) and (word2 in unique_vocab)): print("word is not in the vocabulary") return # (1) ๋‘ ๋‹จ์–ด word1, word2 ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„๋ฅผ ํ™•์ธ word_1 = unique_vocab[unique_vocab.index(word1)] word_2 = unique_vocab[unique_vocab.index(word2)] word_1_vec = cbow.get_word_vector(word_to_idx[word_1]) word_2_vec = cbow.get_word_vector(word_to_idx[word_2]) word_1_vec = torch.flatten(word_1_vec) word_2_vec = torch.flatten(word_2_vec) word_similarity = similarity_between_words(word_1_vec, word_2_vec) if(word_similarity < 0): word_similarity *= -1 print("Similarity between '{}' & '{}' : {:0.4f}".format(word_1, word_2, word_similarity)) def get_most_similar_word(cbow, word, unique_vocab, word_to_idx): word_1 = unique_vocab[unique_vocab.index(word)] word_1_vec = cbow.get_word_vector(word_to_idx[word_1]) word_1_vec = torch.flatten(word_1_vec) most_similar_word = 0 most_similarity = 0.0 print(len(unique_vocab)) for i in range(0, len(unique_vocab)): if(i == unique_vocab.index(word1)): continue word_3_vec = torch.flatten(cbow.get_word_vector(word_to_idx[unique_vocab[i]])) word_similarity = similarity_between_words(word_1_vec, word_3_vec) # print(unique_vocab[i], '{:.4f}'.format(word_similarity)) if(most_similarity < word_similarity): most_similar_word = i most_similarity = word_similarity print("most similar word between '{}' is '{}' with {:.4f} of similarity".format(word_1, unique_vocab[most_similar_word], most_similarity))
ย 
๋‘ ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ถœ๋ ฅํ•˜๊ณ , ๋‹จ์–ด์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋‹จ์–ด๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ธฐ๋Šฅ์ด ์ œ๋Œ€๋กœ ์ž‘๋™ํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
notion image
ย 
Corpus์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ๋” ํฌ๊ฒŒ ํ‚ค์›Œ ํ•™์Šตํ•˜์˜€์„ ๋•Œ, ๋‹จ์–ด๊ฐ„์˜ ์˜๋ฏธ ๊ด€๊ณ„๊ฐ€ ๋”์šฑ ์ ์ ˆํ•ด์ง‘๋‹ˆ๋‹ค. ์œ„ ๋ชจ๋ธ๋ณด๋‹ค 10๋ฐฐ ๋” ๋งŽ์€ ๋‹จ์–ด๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ๊ฒฝ์šฐ, ์œ ์‚ฌํ•˜๋‹ค๊ณ  ํŒ๋‹จ๋œ ๋‹จ์–ด๊ฐ€ ๋”์šฑ ์œ ์˜๋ฏธํ•˜๋‹ค๊ณ  ํŒ๋‹จ๋ฉ๋‹ˆ๋‹ค.
notion image
notion image
ย 
๋” ์ข‹์€ ์„ฑ๋Šฅ์˜ ํ•™์Šต๋จธ์‹ ๊ณผ, ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ๋‹ค๋ฉด ๋”์šฑ ์œ ์˜๋ฏธํ•œ ๋‹จ์–ด๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ์ƒ๊ฐ๋ฉ๋‹ˆ๋‹ค.
ย 

Skip-gram ๋ชจ๋ธ ๊ตฌํ˜„

Skip-gram ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.ย kaggle์˜ text8.txt ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€(1000๊ฐœ์˜ ๋‹จ์–ด)๋กœ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. ์ค‘์‹ฌ๋‹จ์–ด๋ฅผ ์ž…๋ ฅํ–ˆ์„ ๋•Œ, ๊ทธ ์ฃผ๋ณ€๋‹จ์–ด๊ฐ€ ๋„์ถœ๋  ํ™•๋ฅ ์„ ๋†’์ด๋„๋ก ํ•™์Šตํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
ย 
Skip-gram์€ ํ•˜๋‚˜์˜ input layers๊ฐ€ ์žˆ๊ณ  hidden layer๋ฅผ ๊ฑฐ์ณ ์ฃผ๋ณ€๋‹จ์–ด ๊ฐœ์ˆ˜๋งŒํผ output layer๋กœ ์ถœ๋ ฅํ•˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ๋‹จ์–ด์ถœํ˜„ ํŒจํ„ด์„ ํ•™์Šตํ•ด ๋‹จ์–ด์˜ ๋ถ„์‚ฐํ‘œํ˜„์„ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜์ด๊ธฐ ๋•Œ๋ฌธ์— ์†Œํ”„ํŠธ๋งฅ์Šค์™€ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ์ ์ˆ˜๋ฅผ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ๊ทธ ํ™•๋ฅ ๊ณผ ์ •๋‹ต ๋ ˆ์ด๋ธ”๋กœ๋ถ€ํ„ฐ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋กœ ๊ตฌํ•œ ์†์‹ค์„ ์‚ฌ์šฉํ•ด ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
ย 

Skip-gram Class

class SimpleSkipGram: def __init__(self, vocab_size, hidden_size): V, H = vocab_size, hidden_size # ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” W_in = 0.01 * np.random.randn(V, H).astype('f') W_out = 0.01 * np.random.randn(H, V).astype('f') # ๊ณ„์ธต ์ƒ์„ฑ self.in_layer = MatMul(W_in) # ์ž…๋ ฅ์ธต self.out_layer = MatMul(W_out) # ์ถœ๋ ฅ์ธต self.loss_layer1 = SoftmaxWithLoss() # Softmax ๊ณ„์ธต self.loss_layer2 = SoftmaxWithLoss() # Softmax ๊ณ„์ธต # ๋ชจ๋“  ๊ฐ€์ค‘์น˜์™€ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋ฆฌ์ŠคํŠธ์— ๋ชจ์€๋‹ค. layers = [self.in_layer, self.out_layer] self.params, self.grads = [], [] for layer in layers: self.params += layer.params self.grads += layer.grads # ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜์— ๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์„ ์ €์žฅํ•œ๋‹ค. self.word_vecs = W_in
  • SimpleSkipGram ํด๋ž˜์Šค์˜ ์ธ์ˆ˜
    • vocab_size : ์–ดํœ˜ ์ˆ˜
    • hidden_size : hidden layer์˜ ๋‰ด๋Ÿฐ ์ˆ˜
  • W_in , W_out ๋‘๊ฐœ์˜ ๊ฐ€์ค‘์น˜
    • ๊ฐ๊ฐ ์ž‘์€ ๋ฌด์ž‘์œ„ ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™” ๋ฉ๋‹ˆ๋‹ค.
  • ๊ณ„์ธต ์ƒ์„ฑ
    • ์ž…๋ ฅ์ธต์˜ Matmul ๊ณ„์ธต 1๊ฐœ, ์ถœ๋ ฅ์ธต์˜ Matmul ๊ณ„์ธต 1๊ฐœ
    • Softmax with Loss ๊ณ„์ธต ์ฃผ๋ณ€๋‹จ์–ด ์ˆ˜(=์œˆ๋„์šฐ ํฌ๊ธฐ) ๋งŒํผ (์—ฌ๊ธฐ์„œ๋Š” 2๊ฐœ) ๊ณ„์ธต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ( Softmax์™€ Cross entropy error๋ฅผ Softmax with loss๋กœ ํ•ฉ์นจ )
  • word_vec์— ๋ถ„์‚ฐํ‘œํ˜„ ์ €์žฅ
ย 

forward

def forward(self, contexts, target): h = self.in_layer.forward(target) s = self.out_layer.forward(h) l1 = self.loss_layer1.forward(s, contexts[:, 0]) l2 = self.loss_layer2.forward(s, contexts[:, 1]) loss = l1 + l2 return loss
ย 

backward

def backward(self, dout=1): dl1 = self.loss_layer1.backward(dout) dl2 = self.loss_layer2.backward(dout) ds = dl1 + dl2 dh = self.out_layer.backward(ds) self.in_layer.backward(dh) return None
ย 

ํ•™์Šต

## ํ•™์Šต๋ฐ์ดํ„ฐ ์ค€๋น„๊ณผ์ • window_size = 1 hidden_size = 5 # ์€๋‹‰์ธต์˜ ๋‰ด๋Ÿฐ์ˆ˜ batch_size = 3 max_epoch = 1000 text = x # ๋ฐ์ดํ„ฐ corpus, word_to_id, id_to_word = preprocess(text) # corpus๋ฅผ ๋‹จ์–ด id๋กœ ๋ฐ˜ํ™˜ vocab_size = len(word_to_id) # ์–ดํœ˜ ์ˆ˜ contexts, target = create_contexts_target(corpus, window_size) # ์ค‘์‹ฌ, ์ฃผ๋ณ€๋‹จ์–ด ๋ฐ˜ํ™˜ # one-hot encoding target = convert_one_hot(target, vocab_size) contexts = convert_one_hot(contexts, vocab_size)
## ๋ชจ๋ธํ•™์Šต model_2 = SimpleSkipGram(vocab_size, hidden_size) optimizer = Adam() trainer_2 = Trainer(model_2, optimizer) trainer_2.fit(contexts, target, max_epoch, batch_size)
  1. preprocess() : corpus๋ฅผ ๋‹จ์–ด id๋กœ ๋ฐ˜ํ™˜
  1. create_contexts_target() : ์ค‘์‹ฌ ๋‹จ์–ด, ์ฃผ๋ณ€ ๋‹จ์–ด๋ฅผ ๋งŒ๋“ค๊ธฐ target์€ ์ค‘์‹ฌ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ, contexts๋Š” target์„ ๊ธฐ์ค€์œผ๋กœ window size ๋งŒํผ ์–‘ ์˜† ์ฃผ๋ณ€๋‹จ์–ด ๋ฆฌ์ŠคํŠธ
  1. ๊ฐ๊ฐ์„ one-hot encoding์„ ํ•ด ํ•™์Šต๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„
์—ฌ๊ธฐ์„œ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹  ๋ฐฉ๋ฒ•์€ SGD, AdaGrad ๋“ฑ ์ค‘ Adam์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
ย 
ย 
notion image
๊ฐ€๋กœ์ถ•์€ ํ•™์Šต ํšŸ์ˆ˜, ์„ธ๋กœ์ถ•์€ ์†์‹ค์ธ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. ํ•™์Šต๊ฒฝ๊ณผ๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ํ•™์Šต์„ ํ•  ์ˆ˜๋ก ์†์‹ค์ด ๊ฐ์†Œํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

ํ•™์Šต ๊ฒฐ๊ณผ

๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„

๊ฐ ๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. Skip-gram์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ ‡๊ฒŒ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐํ™”์‹œ์ผœ, ๋ถ„์‚ฐํ‘œํ˜„์œผ๋กœ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# (์ค‘๋žต) ghosts [ -1.554753 -14.199934 5.4088864 -8.995648 -4.0544376] mind [ 10.231861 -12.334701 2.9066288 2.9563556 5.0717473] saying [ 0.6028301 -16.115307 0.48380145 -6.917041 2.6495762 ] reality [ 14.173826 -0.3625413 -10.682704 4.753849 8.219211 ] advocated [13.001251 3.5767713 -6.613554 9.323543 7.023613 ]
ย 

Results

word vector์˜ quality๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, 5๊ฐœ type์˜ ์˜๋ฏธ์ ์ธ ์งˆ์˜์™€ 9๊ฐœ type์˜ ๋ฌธ๋ฒ•์ ์ธ ์งˆ์˜๋กœ ๊ตฌ์„ฑ๋œ comprehensive test๋ฅผ ์ •์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด 8869๊ฐœ์˜ ์˜๋ฏธ์ ์ธ ์งˆ์˜์™€ 10675๊ฐœ์˜ ๋ฌธ๋ฒ•์ ์ธ ์งˆ์˜๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค.
notion image
๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์งˆ์˜๋Š” ๋‘ ๋‹จ๊ณ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š”๋ฐ,
โ—
1. ์œ ์‚ฌ ๋‹จ์–ด์Œ์„ ์ˆ˜๋™์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค. 2. ๋‘ ๋‹จ์–ด์Œ์„ ์—ฐ๊ฒฐํ•˜์—ฌ ์งˆ์˜ list๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด 68๊ฐœ์˜ ๋ฏธ๊ตญ ๋Œ€๋„์‹œ์™€ ๊ทธ ๋„์‹œ์— ์†ํ•œ ์ฃผ๋ฅผ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ ๋žœ๋ค์œผ๋กœ 2๊ฐœ์˜ ๋‹จ์–ด์Œ์„ ์„ ํƒํ•˜์—ฌ ์•ฝ 2.5K๊ฐœ์˜ ์งˆ์˜๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ํ…Œ์ŠคํŠธ ์…‹์—์„œ๋Š” New York ๊ณผ ๊ฐ™์€ multi-word๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๊ณ , ๋‹จ์ผ ์–ธ์–ด set์œผ๋กœ๋งŒ ๊ตฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ย 
๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“  ํƒ€์ž…์˜ ์งˆ์˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•˜์˜€๊ณ , ์งˆ์˜๋Š” ๋ฒกํ„ฐ์˜ ๋Œ€์ˆ˜์  ์—ฐ์‚ฐ์˜ ๊ฒฐ๊ณผ๋กœ ๋‚˜์˜จ closest word๊ฐ€ correct word์™€ ์ •ํ™•ํ•˜๊ฒŒ ์ผ์น˜ํ•  ๋•Œ๋งŒ ์ •๋‹ต์ด๋ผ๊ณ  ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋™์˜์–ด๋„ ์˜ค๋‹ต์œผ๋กœ ์ธก์ •๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ •ํ™•๋„๊ฐ€ 100%์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์€ ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 

Maximization of Accuracy

์•ฝ 60์–ต๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ๊ตฌ์„ฑ๋œ Google News Corpus๋ฅผ training์— ์‚ฌ์šฉํ•˜์˜€๊ณ , vocabulary size(๋‹จ์–ด์˜ ์ˆ˜)๋Š” 10๋งŒ๊ฐœ๋กœ ์ œํ•œํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ตœ์ ํ™” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์‹œ๊ฐ„์  ์ œ์•ฝ์— ์ง๋ฉดํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ์ด๋Š” ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ์™€ ๋” ๊ณ ์ฐจ์›์˜ ์›Œ๋“œ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ •ํ™•๋„์˜ ํ–ฅ์ƒ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ์ข‹์€ ๋ชจ๋ธ์„ ๋น ๋ฅด๊ฒŒ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ฐ€์žฅ ๋งŽ์ด ์“ฐ์ด๋Š” 3๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋กœ subset์„ ๊ตฌ์„ฑํ•˜์—ฌ ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
๋จผ์ € CBOW ๊ตฌ์กฐ์— ๋Œ€ํ•˜์—ฌ ์„œ๋กœ ๋‹ค๋ฅธ ์ฐจ์›๊ณผ ํ›ˆ๋ จ dataset์˜ word ๊ฐœ์ˆ˜์— ๋”ฐ๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š”๋ฐ, ์ฐจ์›๊ณผ training data์˜ ์–‘์„ ํ•จ๊ป˜ ์ฆ๊ฐ€์‹œ์ผœ์•ผ ์ •ํ™•๋„๊ฐ€ ํ–ฅ์ƒ๋œ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
์‹ค์ œ ์‹คํ—˜์—์„œ๋Š” 3 epoch ๊ณผ 0.025์˜ learning rate๋ฅผ hyper-parameter๋กœ ํ•˜์—ฌ stochastic gradient descent์™€ ์—ญ์ „ํŒŒ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ›ˆ๋ จํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ย 

Comparison of Model Architectures

๋จผ์ €, ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ๋™์ผํ•œ training data์— ๋Œ€ํ•˜์—ฌ word vector ์ฐจ์›์„ 640์œผ๋กœ ๋™์ผํ•˜๊ฒŒ ๋งž์ถฐ ์‹คํ—˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ถ”ํ›„์˜ ์‹คํ—˜์—์„œ๋Š” 3๋งŒ๊ฐœ์˜ ๋‹จ์–ด ์ œํ•œ์„ ๋‘์ง€ ์•Š๊ณ  ๋ชจ๋“  semantic-syntatic word relationship test๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
RNN์˜ word vector๋Š” syntatic ์งˆ์˜์—์„œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
NNLM์˜ word vector๋Š” RNN๋ณด๋‹ค ํ›จ์”ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋Š”๋ฐ, RNNLM์˜ word vector๋Š” ๋ฐ”๋กœ ๋น„์„ ํ˜•์˜ hidden layer๋กœ ์—ฐ๊ฒฐ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
CBOW์˜ ๊ฒฝ์šฐ syntatic ์งˆ์˜์—์„œ๋Š” NNLM๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ณ  ์žˆ๊ณ , semantic ์งˆ์˜์—์„œ๋Š” ๋น„์Šทํ•œ ๋ชจ์Šต์„ ๋ณด์ž…๋‹ˆ๋‹ค.
Skip-gram๋Š” syntatic ์งˆ์˜์—์„œ๋Š” CBOW๋ณด๋‹ค ๋‹ค์†Œ ์ข‹์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ, semantic ์งˆ์˜์—์„œ๋Š” ์–ด๋– ํ•œ ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
ย 

๋‹ค์Œ์œผ๋กœ๋Š” ์˜ค์ง ํ•˜๋‚˜์˜ CPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜์˜€๊ณ  ๊ณต์ ์œผ๋กœ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ word vector๊ณผ ๋น„๊ตํ•˜์—ฌ ํ‰๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
CBOW ๋ชจ๋ธ์€ Google News Data์˜ subset์œผ๋กœ ์•ฝ ํ•˜๋ฃจ๋™์•ˆ trainingํ•˜์˜€๊ณ , Skip-gram ๋ชจ๋ธ์€ ์•ฝ 3์ผ๋™์•ˆ training ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
ย 

์ถ”ํ›„์˜ ์‹คํ—˜์—์„œ๋Š” 1 epoch๋งŒ์„ ์‚ฌ์šฉํ•˜์˜€๊ณ , learning rate๋ฅผ ์ค„์—ฌ๊ฐ€๋ฉด์„œ ํ›ˆ๋ จ์ด ๋๋‚  ์‹œ์—๋Š” learning rate๊ฐ€ 0์ด ๋˜๊ฒŒ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
1 epoch์— ๋Œ€ํ•˜์—ฌ 2๋ฐฐ ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ 3 epoch์— ๋Œ€ํ•˜์—ฌ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•˜๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
ย 

Large Scale Parallel Training of Models

๋ณธ ๋…ผ๋ฌธ์˜ ์‹คํ—˜์—์„œ๋Š” ๋ถ„์‚ฐ ์—ฐ์‚ฐ ํ”„๋ ˆ์ž„์›Œํฌ์ธ DistBelief๋ฅผ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
Google News 60์–ต๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ mini-batch ๋น„๋™๊ธฐํ™” gradient descent ์™€ Ada-grad๋ฅผ ์ด์šฉํ•˜์—ฌ ํ›ˆ๋ จํ•˜์˜€๋Š”๋ฐ, 50~100๊ฐœ์˜ ๋ณต์ œ๋ณธ์„ ์ด์šฉํ•˜์—ฌ CPU ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ training ๊ฒฐ๊ณผ๋ฅผ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. (1000์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ NNLM์œผ๋กœ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋งˆ๋ฌด๋ฆฌํ•˜๊ธฐ์— ๋„ˆ๋ฌด ๊ธด ์‹œ๊ฐ„์ด ์†Œ์š”๋˜์—ˆ์Šต๋‹ˆ๋‹ค.)
  • Ada-grad : ์ง€๊ธˆ๊นŒ์ง€ ๋งŽ์ด ๋ณ€ํ™”ํ•œ ๊ฐ’์€ ์ ๊ฒŒ ๋ณ€ํ™”ํ•˜๋„๋ก, ์ ๊ฒŒ ๋ณ€ํ™”ํ•œ ๊ฐ’์€ ๋งŽ์ด ๋ณ€ํ™”ํ•˜๋„๋ก learning rate๋ฅผ ์กฐ์ ˆ
  • DistBelief : ๊ฐ™์€ ๋ชจ๋ธ์— ๋Œ€ํ•˜์—ฌ ๋ณต์ œ๋ณธ์„ ๋งŒ๋“ค์–ด ๋ณ‘๋ ฌ์ ์œผ๋กœ ์‹คํ–‰๋˜๋Š”๋ฐ, ๊ฐ๊ฐ์˜ ๋ณต์ œ๋ณธ์€ ์ค‘์•™์˜ server๋ฅผ ํ†ตํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ update ๋™๊ธฐํ™”
    • notion image
ย 

Microsoft Research Sentence Completion Challenge

: Langauge Modeling๊ณผ ๋‹ค๋ฅธ NLP ๊ธฐ์ˆ ์— ๋Œ€ํ•˜์—ฌ ์ตœ๊ทผ์— ์†Œ๊ฐœ๋œ task
1040๊ฐœ์˜ ๋ฌธ์žฅ์œผ๋กœ ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ฐ ๋ฌธ์žฅ์—๋Š” ํ•œ ๋‹จ์–ด๊ฐ€ ๋น„์–ด์žˆ๋Š”๋ฐ 5๊ฐœ์˜ ์„ ํƒ์ง€ ์ค‘ ๊ฐ€์žฅ ์ผ๊ด€์„ฑ์žˆ๋Š” ๋‹จ์–ด๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค. N-gram, LSA-based model, log-bilinear, combination of recurrent neural network์˜ ๊ฒฐ๊ณผ๋Š” 55.4%์˜ ์ •ํ™•๋„๋กœ ์ด๋ฏธ ๋ณด๊ณ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Skip-gram ๊ตฌ์กฐ์˜ ์„ฑ๋Šฅ์„ ํƒ๊ตฌํ•˜๊ธฐ ์œ„ํ•˜์—ฌ 5000๋งŒ๊ฐœ์˜ ๋‹จ์–ด๋กœ 640์ฐจ์›์˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๊ณ  test set์˜ ๊ฐ ๋ฌธ์žฅ์— ๋Œ€ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ™•์ธํ•œ ํ›„ ๊ฐ prediction์„ ํ•ฉ์‚ฐํ•˜์—ฌ ์ตœ์ข… ์ ์ˆ˜๋ฅผ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
Skip-gram ๋‹จ์ผ ๋ชจ๋ธ์€ LSA ๊ธฐ๋ฐ˜๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€ ๋ชปํ–ˆ์ง€๋งŒ, ์ด์ „๊นŒ์ง€ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋˜ RNNLMs์™€ weight๋ฅผ ๊ฒฐํ•ฉํ•˜์˜€์„ ๋•Œ๋Š” 58.9%์˜ ์ •ํ™•๋„๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
ย 

Examples of the Learned Relationships

ํ•˜๋‹จ์˜ ํ‘œ๋Š” ๋‹ค์–‘ํ•œ relationship์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. relationship์„ ๋‘ word vector์˜ ๋บ„์…ˆ์œผ๋กœ ์ •์˜ํ•˜์˜€๊ณ , result๋Š” ๋‹ค๋ฅธ ๋‹จ์–ด๊ฐ€ ๋”ํ•ด์ง„ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™€์žˆ์Šต๋‹ˆ๋‹ค.
e.g. Paris - France + Italy = Rome
notion image
์•„์ง ์„ฑ๋Šฅ์˜ ๊ฐœ์„  ํ•„์š”์„ฑ์€ ๋ณด์ด์ง€๋งŒ ํ˜„์žฌ์˜ ์„ฑ๋Šฅ๋„ ๋‚˜์˜์ง€ ์•Š๊ณ , ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•˜์—ฌ ๋” ๊ณ ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ train ๋œ๋‹ค๋ฉด ํ›จ์”ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ๊ฒƒ์ด๋ผ ์˜ˆ์ƒํ•˜๊ณ  ์žˆ๊ณ , ์ƒˆ๋กœ์šด ํ˜์‹ ์ ์ธ ์‘์šฉ ๋˜ํ•œ ๊ฐ€๋Šฅํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.
ย 

Conclusion

  • ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ representation(LSA, HAL ๋“ฑ)๊ณผ ๋‹ค๋ฅด๊ฒŒ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ๋ณด์กดํ•˜๊ณ , ๋ฒกํ„ฐ์˜ ์—ฐ์‚ฐ ๋˜ํ•œ ๊ฐ€๋Šฅํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๋‹น์‹œ์˜ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์— ๋น„ํ•ด ๋‚ฎ์€ ๊ณ„์‚ฐ๋ณต์žก๋„ ๋•๋ถ„์—, ํ›จ์”ฌ ๋” ํฐ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ ๋งค์šฐ ์ •ํ™•ํ•œ ๊ณ ์ฐจ์› word vector๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Word2vec์€ NNLM(Neural Network Language Model)์„ ๊ณ„์Šนํ•˜๋ฉด์„œ๋„ ์ง๊ด€์ ์ด๋ฉฐ ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ๋กœ ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ๋ฅผ ๋‚ด ๋‹ค์–‘ํ•œ NLP task์— ๋Œ€์ค‘์ ์œผ๋กœ ํ™œ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค.
ย 
Word2Vec์ด ๊ฐ–๋Š” ์˜์˜
  • ์ž์—ฐ์–ด๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋™์ผํ•œ ์˜๋ฏธ์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ํ‘œํ˜„ ๋ฐฉ๋ฒ•(์ž์—ฐ์–ด์˜ Flexibility)์„ ๊ธฐ๊ณ„๊ฐ€ ์ดํ•ดํ•˜๊ณ  ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.
  • ๊ณ„์† ์ƒˆ๋กœ์šด ๋‹จ์–ด๊ฐ€ ์ƒ๊ฒจ๋‚˜๊ณ  ๊ธฐ์กด ๋‹จ์–ด๊ฐ€ ์—†์–ด์ง€๋ฉด์„œ ๋ฐœ์ „ํ•˜๋Š” ์ž์—ฐ์–ด์™€ ๋‹ค๋ฅด๊ฒŒ, ์ปดํ“จํ„ฐ ์–ธ์–ด๋Š” ๊ณ ์ •๋œ ๋ฌธ๋ฒ•์„ ๋”ฐ๋ผ์•ผ ํ•˜๋Š” ์ •์ ์ธ ์–ธ์–ด์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, Word2Vec ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ๋ณ€ํ™”ํ•˜๋Š” ๋‹จ์–ด์˜ ์˜๋ฏธ์™€ ์ƒˆ๋กœ ๋“ฑ์žฅํ•˜๋Š” ํ‘œํ˜„์— ๋Œ€ํ•ด์„œ๋„ ์ปดํ“จํ„ฐ๊ฐ€ ํ•™์Šตํ•˜๊ณ  ์ดํ•ด(evolution)ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ย 

reference

Word2Vecย ย โŒœEfficient Estimation Of Word Representations In Vector SpaceโŒŸ
ย 
๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹2 (์‚ฌ์ดํ†  ๊ณ ํ‚ค)
ย 

์ด์ „ ๊ธ€ ์ฝ๊ธฐ

๋‹ค์Œ ๊ธ€ ์ฝ๊ธฐ