8-5 ์‹ ๊ฒฝ๋ง ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ์‹ค์Šต

์ด์ œ๋ถ€ํ„ฐ S2S ๋ชจ๋ธ์˜ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐ„์†Œํ™”ํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์€ ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ง๋ญ‰์น˜๋กœ ์˜์–ด ๋ฌธ์žฅ๊ณผ ์ด์— ์ƒ์‘ํ•˜๋Š” ํ”„๋ž‘์Šค์–ด ๋ฒˆ์—ญ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ธ์ฝ”๋”๋Š” ์–‘๋ฐฉํ–ฅ GRU์œ ๋‹›์„ ์‚ฌ์šฉํ•˜๊ณ  ์‹œํ€€์Šค์— ์žˆ๋Š” ๋ชจ๋“  ๋ถ€๋ถ„์˜ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž…๋ ฅ ์‹œํ€€์Šค์˜ ๊ฐ ์œ„์น˜์— ๋Œ€ํ•œ ๋ฒกํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ดํ† ์น˜์˜ PackedSequence ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ดํ›„์— ๋‚˜์˜ฌ โ€˜NMT ๋ชจ๋ธ์˜ ์ธ์ฝ”๋”ฉ๊ณผ ๋””์ฝ”๋”ฉโ€™์—์„œ ์ž์„ธํžˆ ๋‹ค๋ค„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
ย 

๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ฐ์ดํ„ฐ์…‹

์ด๋ฒˆ ์˜ˆ์ œ์—์„œ๋Š” ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํƒ€ํ† ์—๋ฐ” ํ”„๋กœ์ ํŠธ(Tatoeba Project)์˜ ์˜์–ด-ํ”„๋ž‘์Šค์–ด ๋ฌธ์žฅ ์Œ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์šฐ์„  ๋ชจ๋“  ๋ฌธ์ž๋ฅผ ์†Œ๋ฌธ์ž๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , NLTK๋ฅผ ์ด์šฉํ•˜์—ฌ ์˜์–ด, ํ”„๋ž‘์Šค์–ด ํ† ํฐํ™”๋ฅผ ๊ฐ ๋ฌธ์žฅ ์Œ์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ NLTK์˜ ์–ธ์–ด์— ํŠนํ™”๋œ ๋‹จ์–ด ํ† ํฐํ™”๋ฅผ ์ ์šฉํ•ด ํ† ํฐ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ฐฉ๊ธˆ๊นŒ์ง€์˜ ๊ธฐ๋ณธ ์ „์ฒ˜๋ฆฌ์— ํŠน์ • ๋ฌธ์žฅ ํŒจํ„ด์„ ์ง€์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ถ„๋งŒ ์„ ํƒํ•ด ํ•™์Šต ๋ฌธ์ œ๋ฅผ ๋‹จ์ˆœํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ œํ•œ๋œ ๋ฌธ์žฅ ํŒจํ„ด์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ฒ”์œ„๋ฅผ ์ขํžˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ชจ๋ธ์˜ ๋ถ„์‚ฐ์„ ๋‚ฎ์ถ”๊ณ  ์งง์€ ์‹œ๊ฐ„ ๋‚ด์— ๋†’์€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
ย 

NMT๋ฅผ ์œ„ํ•œ ๋ฒกํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ

์†Œ์Šค ์˜์–ด์™€ ํƒ€๊นƒ ํ”„๋ž‘์Šค์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ณต์žกํ•œ ํŒŒ์ดํ”„๋ผ์ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ์š”์ธ์—๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ์†Œ์Šค์™€ ํƒ€๊นƒ ์‹œํ€€์Šค๋Š” ๋ชจ๋ธ์—์„œ ๋‹ค๋ฅธ ์—ญํ• ์„ ํ•˜๊ณ , ์–ธ์–ด๋„ ๋‹ค๋ฅด๋ฉฐ, ๋ฒกํ„ฐํ™”๋˜๋Š” ๋ฐฉ์‹๋„ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋‘˜ ์งธ๋กœ ํŒŒ์ดํ† ์น˜์˜ PackedSequence๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์—๋Š” ์†Œ์Šค ์‹œํ€€์Šค์˜ ๊ธธ์ด์— ๋”ฐ๋ผ ๊ฐ ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋ฅผ ์†ŒํŒ…ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์œ„ ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ ๋•Œ๋ฌธ์— NMTVectorizer๋Š” ๋ณ„๋„์˜ SequenceVocabulary ๊ฐ์ฒด ๋‘ ๊ฐœ๋ฅผ ๋งŒ๋“ค๊ณ , ์ตœ๋Œ€ ์‹œํ€€์Šค ๊ธธ์ด๋ฅผ ๋”ฐ๋กœ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
class NMTVectorizer(object): """ ์–ดํœ˜ ์‚ฌ์ „์„ ์ƒ์„ฑํ•˜๊ณ  ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค """ def __init__(self, source_vocab, target_vocab, max_source_length, max_target_length): """ ๋งค๊ฐœ๋ณ€์ˆ˜: source_vocab (SequenceVocabulary): ์†Œ์Šค ๋‹จ์–ด๋ฅผ ์ •์ˆ˜์— ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค target_vocab (SequenceVocabulary): ํƒ€๊นƒ ๋‹จ์–ด๋ฅผ ์ •์ˆ˜์— ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค max_source_length (int): ์†Œ์Šค ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ€์žฅ ๊ธด ์‹œํ€€์Šค ๊ธธ์ด max_target_length (int): ํƒ€๊นƒ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ€์žฅ ๊ธด ์‹œํ€€์Šค ๊ธธ์ด """ self.source_vocab = source_vocab self.target_vocab = target_vocab self.max_source_length = max_source_length self.max_target_length = max_target_length @classmethod def from_dataframe(cls, bitext_df): """ ๋ฐ์ดํ„ฐ์…‹ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ NMTVectorizer๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค ๋งค๊ฐœ๋ณ€์ˆ˜: bitext_df (pandas.DataFrame): ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹ ๋ฐ˜ํ™˜๊ฐ’ : NMTVectorizer ๊ฐ์ฒด """ source_vocab = SequenceVocabulary() target_vocab = SequenceVocabulary() max_source_length = 0 max_target_length = 0 for _, row in bitext_df.iterrows(): source_tokens = row["source_language"].split(" ") if len(source_tokens) > max_source_length: max_source_length = len(source_tokens) for token in source_tokens: source_vocab.add_token(token) target_tokens = row["target_language"].split(" ") if len(target_tokens) > max_target_length: max_target_length = len(target_tokens) for token in target_tokens: target_vocab.add_token(token) return cls(source_vocab, target_vocab, max_source_length, max_target_length)
ย 
๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ ์š”์ธ์œผ๋กœ ์†Œ์Šค์™€ ํƒ€๊นƒ ์‹œํ€€์Šค๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•์ด ๋‹ค๋ฅด๋‹ค๊ณ  ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์†Œ์Šค ์‹œํ€€์Šค๋Š” ์‹œ์ž‘ ๋ถ€๋ถ„์— BEGIN-OF-SEQUENCE ํ† ํฐ์„, ๋งˆ์ง€๋ง‰์— END-OF-SEQUENCE ํ† ํฐ์„ ์ถ”๊ฐ€ํ•˜๋ฉฐ ๋ฒกํ„ฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์–‘๋ฐฉํ–ฅ GRU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†Œ์Šค ์‹œํ€€์Šค์— ์žˆ๋Š” ํ† ํฐ์„ ์œ„ํ•œ ์š”์•ฝ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— ํƒ€๊นƒ ์‹œํ€€์Šค๋Š” ํ† ํฐ ํ•˜๋‚˜๊ฐ€ ๋ฐ€๋ฆฐ ๋ณต์‚ฌ๋ณธ ๋‘ ๊ฐœ๋กœ ๋ฒกํ„ฐํ™”๋ฉ๋‹ˆ๋‹ค. ์‹œํ€€์Šค ์˜ˆ์ธก ์ž‘์—…์—๋Š” ํƒ€์ž„ ์Šคํ…๋งˆ๋‹ค ์ž…๋ ฅ ํ† ํฐ๊ณผ ์ถœ๋ ฅ ํ† ํฐ์ด ํ•„์š”ํ•œ๋ฐ, S2S์˜ ๋ชจ๋ธ ๋””์ฝ”๋”๊ฐ€ ์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ๋„ ์ธ์ฝ”๋” ๋ฌธ๋งฅ์ด ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ์ด ์ž‘์—…์„ ๋‹จ์ˆœํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์†Œ์Šค์™€ ํƒ€๊นƒ ์ธ๋ฑ์Šค์— ์ƒ๊ด€์—†์ด ๋ฒกํ„ฐํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” _vectorize() ๋ฉ”์„œ๋“œ๋ฅผ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ธ๋ฑ์Šค๋ฅผ ๊ฐ๊ธฐ ์ฒ˜๋ฆฌํ•˜๋Š” ๋‘ ๋ฉ”์†Œ๋“œ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
def _vectorize(self, indices, vector_length=-1, mask_index=0): """์ธ๋ฑ์Šค๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค ๋งค๊ฐœ๋ณ€์ˆ˜: indices (list): ์‹œํ€€์Šค๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ •์ˆ˜ ๋ฆฌ์ŠคํŠธ vector_length (int): ์ธ๋ฑ์Šค ๋ฒกํ„ฐ์˜ ๊ธธ์ด mask_index (int): ์‚ฌ์šฉํ•  ๋งˆ์Šคํฌ ์ธ๋ฑ์Šค; ๊ฑฐ์˜ ํ•ญ์ƒ 0 """ if vector_length < 0: vector_length = len(indices) vector = np.zeros(vector_length, dtype=np.int64) vector[:len(indices)] = indices vector[len(indices):] = mask_index return vector def _get_source_indices(self, text): """ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜๋œ ์†Œ์Šค ํ…์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค ๋งค๊ฐœ๋ณ€์ˆ˜: text (str): ์†Œ์Šค ํ…์ŠคํŠธ; ํ† ํฐ์€ ๊ณต๋ฐฑ์œผ๋กœ ๊ตฌ๋ถ„๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค ๋ฐ˜ํ™˜๊ฐ’: indices (list): ํ…์ŠคํŠธ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ •์ˆ˜ ๋ฆฌ์ŠคํŠธ """ indices = [self.source_vocab.begin_seq_index] indices.extend(self.source_vocab.lookup_token(token) for token in text.split(" ")) indices.append(self.source_vocab.end_seq_index) return indices def _get_target_indices(self, text): """ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜๋œ ํƒ€๊นƒ ํ…์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค ๋งค๊ฐœ๋ณ€์ˆ˜: text (str): ํƒ€๊นƒ ํ…์ŠคํŠธ; ํ† ํฐ์€ ๊ณต๋ฐฑ์œผ๋กœ ๊ตฌ๋ถ„๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค ๋ฐ˜ํ™˜๊ฐ’: ํŠœํ”Œ: (x_indices, y_indices) x_indices (list): ๋””์ฝ”๋”์—์„œ ์ƒ˜ํ”Œ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ •์ˆ˜ ๋ฆฌ์ŠคํŠธ y_indices (list): ๋””์ฝ”๋”์—์„œ ์˜ˆ์ธก์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ •์ˆ˜ ๋ฆฌ์ŠคํŠธ """ indices = [self.target_vocab.lookup_token(token) for token in text.split(" ")] x_indices = [self.target_vocab.begin_seq_index] + indices y_indices = indices + [self.target_vocab.end_seq_index] return x_indices, y_indices def vectorize(self, source_text, target_text, use_dataset_max_lengths=True): """ ๋ฒกํ„ฐํ™”๋œ ์†Œ์Šค ํ…์ŠคํŠธ์™€ ํƒ€๊นƒ ํ…์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค ๋ฒกํ„ฐํ™”๋œ ์†Œ์Šค ํ…์ŠฝํŠธ๋Š” ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค. ๋ฒกํ„ฐํ™”๋œ ํƒ€๊นƒ ํ…์ŠคํŠธ๋Š” 7์žฅ์˜ ์„ฑ์”จ ๋ชจ๋ธ๋ง๊ณผ ๋น„์Šทํ•œ ์Šคํƒ€์ผ๋กœ ๋‘ ๊ฐœ์˜ ๋ฒกํ„ฐ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. ๊ฐ ํƒ€์ž„ ์Šคํ…์—์„œ ์ฒซ ๋ฒˆ์งธ ๋ฒกํ„ฐ๊ฐ€ ์ƒ˜ํ”Œ์ด๊ณ  ๋‘ ๋ฒˆ์งธ ๋ฒกํ„ฐ๊ฐ€ ํƒ€๊นƒ์ด ๋ฉ๋‹ˆ๋‹ค. ๋งค๊ฐœ๋ณ€์ˆ˜: source_text (str): ์†Œ์Šค ์–ธ์–ด์˜ ํ…์ŠคํŠธ target_text (str): ํƒ€๊นƒ ์–ธ์–ด์˜ ํ…์ŠคํŠธ use_dataset_max_lengths (bool): ์ตœ๋Œ€ ๋ฒกํ„ฐ ๊ธธ์ด๋ฅผ ์‚ฌ์šฉํ• ์ง€ ์—ฌ๋ถ€ ๋ฐ˜ํ™˜๊ฐ’: ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ‚ค์— ๋ฒกํ„ฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ด์€ ๋”•์…”๋„ˆ๋ฆฌ: source_vector, target_x_vector, target_y_vector, source_length """ source_vector_length = -1 target_vector_length = -1 if use_dataset_max_lengths: source_vector_length = self.max_source_length + 2 target_vector_length = self.max_target_length + 1 source_indices = self._get_source_indices(source_text) source_vector = self._vectorize(source_indices, vector_length=source_vector_length, mask_index=self.source_vocab.mask_index) target_x_indices, target_y_indices = self._get_target_indices(target_text) target_x_vector = self._vectorize(target_x_indices, vector_length=target_vector_length, mask_index=self.target_vocab.mask_index) target_y_vector = self._vectorize(target_y_indices, vector_length=target_vector_length, mask_index=self.target_vocab.mask_index) return {"source_vector": source_vector, "target_x_vector": target_x_vector, "target_y_vector": target_y_vector, "source_length": len(source_indices)}
ย 
๋ณต์žก๋„์˜ ๋‹ค์Œ ์š”์ธ์€ ์†Œ์Šค ์‹œํ€€์Šค์ž…๋‹ˆ๋‹ค. ์–‘๋ฐฉํ–ฅ GRU๋กœ ์†Œ์Šค ์‹œํ€€์Šค๋ฅผ ์ธ์ฝ”๋”ฉํ•  ๋•Œ ํŒŒ์ดํ† ์น˜์˜ PackedSequence ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค์˜ ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋Š” ๊ฐ ์‹œํ€€์Šค๋ฅผ ํ–‰์œผ๋กœ ์Œ“์€ ์ •์ˆ˜ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„๋˜๋ฉฐ, ์‹œํ€€์Šค๋Š” ์™ผ์ชฝ ์ •๋ ฌ๋˜๊ณ  ์ œ๋กœ ํŒจ๋”ฉ๋˜์–ด ๊ฐ€๋ณ€ ๊ธธ์ด๋ฅผ ํ—ˆ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. PackedSequence ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋ฅผ ๋ฐฐ์—ด ํ•˜๋‚˜๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์‹œํ€€์Šค์˜ ํƒ€์ž„ ์Šคํ… ๋ฐ์ดํ„ฐ๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ์—ฐ๊ฒฐํ•˜๊ณ  ํƒ€์ž„ ์Šคํ…๋งˆ๋‹ค ์‹œํ€€์Šค ๊ธธ์ด๋ฅผ ๊ธฐ๋กํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
notion image
ย 
PackedSequence๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด ๊ฐ ์‹œํ€€์Šค์˜ ๊ธธ์ด๋ฅผ ์•Œ์•„์•ผ ํ•˜๋ฉฐ, ์‹œํ€€์Šค์˜ ๊ธธ์ด ์ˆœ์„œ๋Œ€๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ •๋ ฌ๋œ ํ–‰๋ ฌ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ ๋ฏธ๋‹ˆ๋ฐฐ์น˜์— ์žˆ๋Š” ํ…์„œ๋ฅผ ์‹œํ€€์Šค ๊ธธ์ด ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ๋Š” generate_batches()๋ฅผ ์ˆ˜์ •ํ•œ generate_nmt_batches() ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
def generate_nmt_batches(dataset, batch_size, shuffle=True, drop_last=True, device="cpu"): """ ํŒŒ์ดํ† ์น˜ DataLoader๋ฅผ ๊ฐ์‹ธ๊ณ  ์žˆ๋Š” ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ํ•จ์ˆ˜; NMT ๋ฒ„์ „ """ dataloader = DataLoader(dataset=dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last) for data_dict in dataloader: lengths = data_dict['x_source_length'].numpy() sorted_length_indices = lengths.argsort()[::-1].tolist() out_data_dict = {} for name, tensor in data_dict.items(): out_data_dict[name] = data_dict[name][sorted_length_indices].to(device) yield out_data_dict
ย 

ย 

์ธ์ฝ”๋”ฉ๊ณผ ๋””์ฝ”๋”ฉ

์˜์–ด๋ฅผ ํ”„๋ž‘์Šค์–ด๋กœ ๋ฒˆ์—ญํ•˜๋Š” ๊ธฐ๊ณ„ ๋ณ€์—ญ์„ ์‹คํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ธ์ฝ”๋”-๋””์ฝ”๋” ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ธ์ฝ”๋”๊ฐ€ ์–‘๋ฐฉํ–ฅ GRU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์†Œ์Šค ์‹œํ€€์Šค(์˜์–ด ๋ฌธ์žฅ)์„ ๋ฒกํ„ฐ ์ƒํƒœ์˜ ์‹œํ€€์Šค๋กœ ๋งคํ•‘ํ•˜๋ฉด ๋””์ฝ”๋”๊ฐ€ ์ธ์ฝ”๋”์—์„œ ์ถœ๋ ฅ๋œ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ดˆ๊ธฐ ์€๋‹‰ ์ƒํƒœ๋กœ ๋ฐ›์•„์™€ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ ์†Œ์Šค ์‹œํ€€์Šค๋ฅผ ํ† ๋Œ€๋กœ ์ถœ๋ ฅ ์‹œํ€€์Šค(ํ”„๋ž‘์Šค์–ด ๋ฒˆ์—ญ๋ฌธ)๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.
๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹ ๊ฒฝ๋ง ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ชจ๋ธ NMTModel์„ ์ฝ”๋”ฉํ•ด๋ด…๋‹ˆ๋‹ค. NMTModel์€ ํ•˜๋‚˜์˜ forward() ๋ฉ”์„œ๋“œ(์ •๋ฐฉํ–ฅ ๊ณ„์‚ฐํ•˜๋Š” ๋ฉ”์„œ๋“œ)์— ์ธ์ฝ”๋”์™€ ๋””์ฝ”๋”๋ฅผ ์บก์Аํ™”ํ•˜์—ฌ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
ย 
class NMTModel(nn.Module): """ ์‹ ๊ฒฝ๋ง ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ชจ๋ธ """ def __init__(self, source_vocab_size, source_embedding_size, target_vocab_size, target_embedding_size, encoding_size, target_bos_index): """ ๋งค๊ฐœ๋ณ€์ˆ˜: source_vocab_size (int): ์†Œ์Šค ์–ธ์–ด์˜ ๊ณ ์œ ํ•œ ๋‹จ์–ด ๊ฐœ์ˆ˜ source_embedding_size (int): ์†Œ์Šค ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ target_vocab_size (int): ํƒ€๊นƒ ์–ธ์–ด์˜ ๊ณ ์œ ํ•œ ๋‹จ์–ด ๊ฐœ์ˆ˜ target_embedding_size (int): ํƒ€๊นƒ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ encoding_size (int): ์ธ์ฝ”๋” RNN์˜ ํฌ๊ธฐ target_bos_index (int): BEGIN-OF-SEQUENCE ํ† ํฐ ์ธ๋ฑ์Šค """ super(NMTModel, self).__init__() self.encoder = NMTEncoder(num_embeddings = source_vocab_size, embedding_size = source_embedding_size, rnn_hidden_size = encoding_size) decoding_size = encoding_size = 2 self.decoder = NMTDecoder(num_embeddings = target_vocab_size, embedding_size = tarrget_embedding_size, rnn_hidden_size = decoding_size, bos_index = target_bos_index) def forward(self, x_source, x_source_lengths, target_sequence): """ ๋ชจ๋ธ์˜ ์ •๋ฐฉํ–ฅ ๊ณ„์‚ฐ ๋งค๊ฐœ๋ณ€์ˆ˜: x_source (torch.Tensor): ์†Œ์Šค ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ์„ผ์„œ x_source.shape๋Š” (batch, vectorizer.max_source_length)์ž…๋‹ˆ๋‹ค. x_source_lengths torch.Tensor): x_source์˜ ์‹œํ€€์Šค ๊ธธ์ด target_sequence (torch.Tensor): ํƒ€๊นƒ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ํ…์„œ ๋ฐ˜ํ™˜๊ฐ’: decoded_states (torch.Tensor): ๊ฐ ์ถœ๋ ฅ ํƒ€์ž„ ์Šคํ…์˜ ์˜ˆ์ธก ๋ฒกํ„ฐ """ encoder_state, final_hidden_states = self.encoder(x_source, x_source_lengths) decoded_states = self.decoder(encoder_state = encoder_state, initial_hidden_state = final_hidden_states, target_sequence = target_sequence) return decoded_states
ย 
๋‹ค์Œ์œผ๋กœ ์–‘๋ฐฉํ–ฅ GRU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๊ณ  ํŠน์„ฑ์„ ์ถ”์ถœํ•˜๋Š”, ์ฆ‰, ์†Œ์Šค ์‹œํ€€์Šค๋ฅผ ๋ฒกํ„ฐ ์ƒํƒœ๋กœ ๋งคํ•‘ํ•˜๋Š” ์ธ์ฝ”๋” NMTEncoder๋ฅผ ์ฝ”๋”ฉํ•ด๋ด…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ธ์ฝ”๋”์˜ ์ถœ๋ ฅ์€ ์–‘๋ฐฉํ–ฅ GRU์˜ ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๊ฐ€ ๋˜๊ณ  ์ด๋ฅผ ์ดํ›„ ๋””์ฝ”๋”๊ฐ€ ๋ฐ›๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
์ฝ”๋“œ๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด ์šฐ์„  ์ž„๋ฒ ๋”ฉ ์ธต์„ ์‚ฌ์šฉํ•ด ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ž„๋ฒ ๋”ฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค๋Š” padding_idx๋ผ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. padding_idx์™€ ๋™์ผํ•œ ๋ชจ๋“  ์œ„์น˜๋Š” 0๋ฒกํ„ฐ๊ฐ€ ๋˜๋ฉฐ ์ตœ์ ํ™” ๊ณผ์ •์„ ๊ฑฐ์น  ๋•Œ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š๋Š” ๋งˆ์Šคํ‚น(masking)์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋งŒ ์–‘๋ฐฉํ–ฅ GRU๋Š” ํŠน๋ณ„ํžˆ ์ˆœ๋ฐฉํ–ฅ์ผ ๋•Œ์™€ ์—ญ๋ฐฉํ–ฅ์ผ ๋•Œ์˜ ๋งˆ์Šคํ‚น๋œ ์œ„์น˜๊ฐ€ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ธ์ฝ”๋”-๋””์ฝ”๋” ๋ชจ๋ธ์—์„œ๋Š” ๋งˆ์Šคํ‚น ์œ„์น˜๋ฅผ ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ, ํŒŒ์ดํ† ์น˜์˜ PackedSequence ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฒ˜๋ฆฌํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
ย 
class NMTEncoder(nn.Module): def __init__(self, num_embeddings, embedding_size, rnn_hidden_size): """ ๋งค๊ฐœ๋ณ€์ˆ˜: num_embeddings (int): ์ž„๋ฒ ๋”ฉ ๊ฐœ์ˆ˜๋Š” ์†Œ์Šค ์–ดํœ˜ ์‚ฌ์ „์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค embedding_size (int): ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ rnn_hidden_size (int): RNN ์€๋‹‰ ์ƒํƒœ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ """ super(NMTEncoder, self).__init__() self.source_embedding = nn.Embedding(num_embeddings, embedding_size, padding_idx = 0) self.birnn = nn.GRU(embedding_size, rnn_hidden_size, bidirectional = True, batch_first = True) def forward(self, x_source, x_lengths): x_embedded = self.source_embedding(x_source) # create PackedSequence ์ƒ์„ฑ x_packed.data.shape = (number_items, # embedding_size) x_lengths = x_lengths.detatch().cpu().numpy() x_packed = pack_padded_sequence(x_embedded, x_lengths, batch_first = True) # x_birnn_h.shape = (num_rnn, batch_size, feature_size) x_birnn_out, x_birnn_h = self.birnn(x_packed) # (batch_size, num_rnn. feature_size)๋กœ ๋ณ€ํ™˜ x_birnn_h = x_brnn_h.permute(1, 0, 2) # ํŠน์„ฑ ํŽผ์นจ. (batch_size, num_rnn * feature_size)๋กœ ๋ฐ”๊พธ๊ธฐ # (์ฐธ๊ณ : -1์€ ๋‚จ์€ ์ฐจ์›์— ํ•ด๋‹นํ•˜๋ฉฐ, # ๋‘ ๊นจ์˜ RNN ์€๋‹‰ ๋ฒกํ„ฐ๋ฅผ 1๋กœ ํŽผ์นฉ๋‹ˆ๋‹ค) x_birnn_h = x_birnn_h.contiguous().view(x_birnn_h.size(0), -1) x_unpacked, _ = pad_packed_sequence(x_birnn_out, batch_first = True) return x_unpacked, x_birnn_h
ย 
์ด์ œ ์ธ์ฝ”๋”์˜ ์ถœ๋ ฅ์ธ ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๋ฅผ ๋””์ฝ”๋” NMTDecoder์ด ๋ฐ›์•„ ํƒ€์ž„ ์Šคํ…์„ ์ˆœํšŒํ•˜๋ฉด์„œ ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
์ด ์˜ˆ์ œ๋Š” ํƒ€๊นƒ ์‹œํ€€์Šค๊ฐ€ ํƒ€์ž„ ์Šคํ…๋งˆ๋‹ค ์ƒ˜ํ”Œ๋กœ ์ œ๊ณต๋œ๋‹ค๋Š” ํŠน์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. GRUCell์„ ์‚ฌ์šฉํ•ด ์€๋‹‰ ์ƒํƒœ๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด ์ธ์ฝ”๋”์˜ ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ์— Linear์ธต์„ ์ ์šฉํ•˜์—ฌ ์ดˆ๊ธฐ ์€๋‹‰ ์ƒํƒœ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ, ์ด๋•Œ ๋””์ฝ”๋” GRU๋Š” ์ž„๋ฒ ๋”ฉ๋œ ์ž…๋ ฅ ํ† ํฐ๊ณผ ๋งˆ์ง€๋ง‰ ํƒ€์ž„ ์Šคํ…์˜ ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ ์—ฐ๊ฒฐํ•œ ๋ฒกํ„ฐ๋ฅผ ์ž…๋ ฅ ๋ฐ›๋Š”๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฟผ๋ฆฌ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ ์ƒˆ๋กœ์šด ์ž…๋ ฅ ๋ฒกํ„ฐ๋ฅผ ํ˜„์žฌ ํƒ€์ž„ ์Šคํ…์—์„œ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“  ํ›„ ์€๋‹‰ ์ƒํƒœ์™€ ์—ฐ๊ฒฐํ•˜์—ฌ ๋””์ฝ”๋”ฉ ์ •๋ณด๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด ๋ฒกํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜๊ธฐ(๊ฐ„๋‹จํ•œ Linear์ธต)๊ฐ€ ์˜ˆ์ธก ๋ฒกํ„ฐ score_for_y_t_index๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์˜ˆ์ธก ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
ย 
class NMTDecoder(nn.Module): def __init__(self, num_embeddings, embedding_size, rnn_hidden_size, bos_index): """ ๋งค๊ฐœ๋ณ€์ˆ˜: num_embeddings (int): ์ž„๋ฒ ๋”ฉ ๊ฐœ์ˆ˜๋Š” ํƒ€๊นƒ ์–ดํœ˜ ์‚ฌ์ „์˜ ๊ณ ์œ ํ•œ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜์ž…๋‹ˆ๋‹ค embeddin_size (int): ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ ํฌ๊ธฐ rnn_hidden_size (int): RNN ์€๋‹‰ ์ƒํƒœ ํฌ๊ธฐ bos_index (int): begin-of-sequence ์ธ๋ฑ์Šค """ super(NMTDecoder, self).__init__() self._rnn_hidden_size = rnn_hidden_size self.target_embedding = nn.Embedding(num_embeddings = num_embeddings, embedding_dim = embedding_size, padding_ide = 0) self.gru_cell = nn.GRUCell(embedding_size + rnn_hidden_size, rnn_hidden_size) self.hidden_map = nn.Linear(rnn_hidden_size, rnn_hidden_size) self.classifie = nn.Linear(rnn_hidden_size * 2, num_embeddings) self.bos_index = bos_index def _init_indices(self, batch_size): """ BEGIN-OF-SEQUENCE ์ธ๋ฑ์Šค ๋ฒกํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค """ return torch.ones(batch_size, dtype = torch.int64) * self.bos_index def _init_context_vectors(self, batch_size): """ ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ 0 ๋ฒกํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค """ return torch.zeros(batch_size, self._rnn_hidden_size) def forward(self, encoder_state, initial_hidden_state, target_sequence): """ """ # ๊ฐ€์ •: ์ฒซ ๋ฒˆ์งธ ์ฐจ์›์€ ๋ฐฐ์น˜ ์ฐจ์›์ž…๋‹ˆ๋‹ค # ์ฆ‰ ์ž…๋ ฅ์€ (Batch, Seq) # ์‹œํ€€์Šค์— ๋Œ€ํ•ด ๋ฐ˜๋ณตํ•ด์•ผ ํ•˜๋ฏ€๋กœ (Seq, Batch)๋กœ ์ฐจ์›์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค target_sequence = target_sequence.permute(1,0) # ์ฃผ์–ด์ง„ ์ธ์ฝ”๋”์˜ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ดˆ๊ธฐ ์€๋‹‰ ์ƒํƒœ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค h_t = self.hidden_map(initial_hidden_state) batch_size = encoder_state_size(0) # ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค context_vectors - self._init_context_vectors(batch_size) # ์ฒ˜ ๋‹จ์–ด y_t๋ฅผ BOS๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค y_t_index = self._init_indeices(batch_size) h_t = h_t.to(encoder_state.device) y_index = y_t_index.to(encoder_state.device) context_vectors = context_vectors.to(encoder_state.device) output_vectors = [] #๋ถ„์„์„ ์œ„ํ•ด GPU์—์„œ ์บ์‹ฑ๋œ ๋ชจ๋“  ํ…์„œ๋ฅผ ๊ฐ€์ ธ์™€ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค self._cached_p_attn = [] self._cached_ht = [] self._cached_decoder_state = encoder_state.cpu().detatch().numpy() output_sequence_size = target_sequence_size(0) for i in range(output_sequence_size): # 1๋‹จ๊ณ„: ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๊ณ  ์ด์ „ ๋ฌธ๋งฅ๊ณผ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค y_input_vector = self.target_embedding(target_sequence[i]) rnn_input = torch.cat([y_input_vector, context_vectors], dim = 1) # 2๋‹จ๊ณ„: GRU๋ฅผ ์ ์šฉํ•˜๊ณ  ์ƒˆ๋กœ์šด ์€๋‹‰ ๋ฒกํ„ฐ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค h_t = self.gru_cell(rnn_input, h_t) self._cached_ht.append(h_t.cpu().data.numpy()) # 3๋‹จ๊ณ„: ํ˜„์žฌ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์‚ฌ์šฉํ•ด ์ธ์ฝ”๋”์˜ ์ƒํƒœ๋ฅผ ์ฃผ๋ชฉํ•ฉ๋‹ˆ๋‹ค context_vectors, p_attn, _ = \ verbose_attention(encoder_state_vectors = encoder_state, query_vector = h_t) # ๋ถ€๊ฐ€ ์ž‘์—…: ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด ์–ดํ…์…˜ ํ™•๋ฅ ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค self._cached_p_attn.append(p_attn.cpu().detatch().numpy()) # 4๋‹จ๊ณ„: ํ˜„์žฌ ์€๋‹‰ ์ƒํƒœ์™€ ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค prediction_vector = torch.cat((context_vectors, h_t), dim = 1) score_for_y_t_index = self.classifier(prediction_vector) # ๋ถ€๊ฐ€ ์ž‘์—…: ์˜ˆ์ธก ์„ฑ๋Šฅ ์ ์ˆ˜๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค output_vectors.append(score_for_y_t_index)
ย 

ย 

์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ธฐ

์ด ์˜ˆ์ œ์—์„œ๋Š” ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ๋™์ž‘์„ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ๊ธ€ ์—์„œ ์„ค๋ช…ํ–ˆ๋˜ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ๋ชจ๋ธ์„ ๋‹ค์‹œ ํ•œ ๋ฒˆ ๊ฐ€์ ธ์™€ ์ˆ˜์‹์ ์œผ๋กœ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
ย 
notion image
ate๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ๋””์ฝ”๋” RNN์— ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด์˜ค๋Š” ๋””์ฝ”๋” ์€๋‹‰ ์ƒํƒœ ๊ฐ’์„ ์ฟผ๋ฆฌ(Query)๋ผ๊ณ  ๋ถ€๋ฅด๊ณ , ์ธ์ฝ”๋” RNN์˜ ๊ฐ ์ถœ๋ ฅ๊ฐ’๋“ค์„ ํ‚ค(Key), ๊ฐ’(Value)๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์–ด๋–ค ๋‹จ์–ด๋“ค์— ์ง‘์ค‘ํ•  ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ธ ์–ดํ…์…˜ ๊ฐ’(Attention Score)๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด, 8-3์—์„œ ์„ค๋ช…ํ–ˆ๋˜ ์—๋„ˆ์ง€ ๊ฐ’(Energy - ๋‹จ์–ด๋ผ๋ฆฌ ์–ผ๋งˆ๋‚˜ ์—ฐ๊ด€์„ฑ์ด ์žˆ๋Š”๊ฐ€)๋ฅผ ๊ตฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ๋””์ฝ”๋”์˜ ์€๋‹‰ ๊ฐ’์ธ ์ฟผ๋ฆฌ์™€ ์ธ์ฝ”๋”์˜ ๊ฐ ์€๋‹‰ ๊ฐ’์ธ ํ‚ค๋ฅผ ๋‚ด์ ํ•ด์ค๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ์—ฐ์‚ฐ ๊ฒฐ๊ณผ๋กœ ์Šค์นผ๋ผ ๊ฐ’์„ ์–ป๊ธฐ ์œ„ํ•ด, ๋””์ฝ”๋” ๊ฐ’์€ ์ „์น˜ํ•˜์—ฌ ๋‚ด์ ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
ย 
๊ตฌํ•œ ์ผ๋ จ์˜ ์Šค์นผ๋ผ๊ฐ’๋“ค์„ 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์ด๋ฉด์„œ ์ด ํ•ฉ์ด 1์ธ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ด์ค๋‹ˆ๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋Š” ์ฃผ์–ด์ง„ ๊ฐ’๋“ค์˜ ๋น„์œจ์€ ์œ ์ง€ํ•˜๋ฉด์„œ ์ด ํ•ฉ์ด 1์ด ๋˜๋„๋ก ๋งŒ๋“ค์–ด์ฃผ๋Š”, ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ’์ด ์œ„ ์ด๋ฏธ์ง€์—์„œ๋Š” ๊ฐ€์ค‘์น˜๋กœ ํ‘œํ˜„๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค.
ย 
๋’ค์—์„œ ๋‹ค๋ฃฐ ํ‘œ์ด์ง€๋งŒ, ์„ค๋ช…์— ๋„์›€์„ ์ฃผ๊ธฐ ์œ„ํ•ด ์•ž์œผ๋กœ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค. ๋””์ฝ”๋”์˜ ์–ดํ…์…˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์•„๋ž˜ ํ‘œ์—์„œ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š”, ์†Œ์Šค ๋ฌธ์žฅ๊ณผ ๋ฒˆ์—ญ๋œ ๋ฌธ์žฅ์˜ ๊ฐ ๋‹จ์–ด๊ฐ„์— ๋†’์€ ๊ด€๊ณ„๋ฅผ ๊ฐ–๊ณ ์žˆ๋Š” ๋‹จ์–ด๋ผ๋ฆฌ ๋†’์€ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํ™•๋ฅ  ๋ถ„ํฌ๊ฐ€ ๋†’์€ ๋‹จ์–ด์˜ ๊ฐ’์— ๋” ๋งŽ์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๊ณ , ๋‹จ์–ด ๋ฒˆ์—ญํ•  ๋•Œ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ๋‹จ์–ด์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
notion image
ย 
๊ฐ ์ธ์ฝ”๋”์˜ ์€๋‹‰ ๊ฐ’๊ณผ ์•ž์„œ ๊ตฌํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•œ ๋’ค, ๊ณฑํ•œ ๊ฐ’๋“ค์˜ ํ•ฉ์ธ ๊ฐ€์ค‘ ํ•ฉ(Weighted Sum)์„ ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์–ดํ…์…˜ ๊ฐ’ (Attention Score, Attention Value,) ๋˜๋Š” ๋ฌธ๋งฅ์„ ๋‹ด๊ณ ์žˆ๋Š” ๋ฒกํ„ฐ์ธ ๋ฌธ๋งฅ ๋ฒกํ„ฐ(Context Vector)๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์ด ์–ดํ…์…˜ ๊ฐ’์„ ํ†ตํ•ด, ๋ชจ๋ธ์€ ๊ฐ ๋‹จ์–ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋–ค ๋‹จ์–ด์— ์ง‘์ค‘ํ•ด์•ผ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
ย 
์•„๋ž˜๋Š” ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์–ดํ…์…˜ ํ•จ์ˆ˜์ธ verbose_attention ์€ ์œ„์—์„œ ์„ค๋ช…ํ•œ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ•œ ์ค„์— ํ•˜๋‚˜์”ฉ ์ž์„ธํ•˜๊ฒŒ ์„ค๋ช…ํ•ด๋‘” ๊ฒƒ์ด๊ณ , ๋‘ ๋ฒˆ์งธ terse_attention ์€ matmul์„ ์‚ฌ์šฉํ•ด ์กฐ๊ธˆ ๋” ํšจ์œจ์ ์œผ๋กœ ์—ฐ์‚ฐํ•˜๋Š” ๊ณผ์ •์„ ๊ตฌํ˜„ํ•œ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
def verbose_attention(encoder_state_vectors, query_vector): """ ์›์†Œ๋ณ„ ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•˜๋Š” ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ฒ„์ „ ๋งค๊ฐœ๋ณ€์ˆ˜: encoder_state_vectors (torch.Tensor): ์ธ์ฝ”๋”์˜ ์–‘๋ฐฉํ–ฅ GRU์—์„œ ์ถœ๋ ฅ๋œ 3์ฐจ์› ํ…์„œ query_vector (torch.Tensor): ๋””์ฝ”๋” GRU์˜ ์€๋‹‰ ์ƒํƒœ """ batch_size, num_vectors, vector_size = encoder_state_vectors.size() vector_scores = torch.sum(encoder_state_vectors * query_vector.view(batch_size, 1, vector_size), dim=2) # ์ฟผ๋ฆฌ ๊ฐ’๊ณผ ๊ฐ ์ธ์ฝ”๋” RNN์˜ ์€๋‹‰ ๋ฒกํ„ฐ๋“ค์˜ ๊ณฑ vector_probabilities = F.softmax(vector_scores, dim=1) # ๊ฐ’๋“ค์„ 0~1์˜ ๊ฐ’์œผ๋กœ ํ™•๋ฅ  ๋ถ„ํฌํ™”๋กœ ๊ฐ€์ค‘์น˜ ๊ตฌํ•˜๊ธฐ weighted_vectors = encoder_state_vectors * vector_probabilities.view(batch_size, num_vectors, 1) # ์€๋‹‰ ๋ฒกํ„ฐ๊ฐ’๊ณผ ๊ฐ€์ค‘์น˜์˜ ๊ณฑ ๊ตฌํ•˜๊ธฐ context_vectors = torch.sum(weighted_vectors, dim=1) # ์ด ๊ฐ’๋“ค์˜ ํ•ฉ์œผ๋กœ ์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ ๋˜๋Š” ์–ดํ…์…˜ ๊ฐ’ ๊ตฌํ•˜๊ธฐ return context_vectors, vector_probabilities, vector_scores def terse_attention(encoder_state_vectors, query_vector): """ ์ ๊ณฑ์„ ์‚ฌ์šฉํ•˜๋Š” ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ฒ„์ „ ๋งค๊ฐœ๋ณ€์ˆ˜: encoder_state_vectors (torch.Tensor): ์ธ์ฝ”๋”์˜ ์–‘๋ฐฉํ–ฅ GRU์—์„œ ์ถœ๋ ฅ๋œ 3์ฐจ์› ํ…์„œ query_vector (torch.Tensor): ๋””์ฝ”๋” GRU์˜ ์€๋‹‰ ์ƒํƒœ """ vector_scores = torch.matmul(encoder_state_vectors, query_vector.unsqueeze(dim=2)).squeeze() # matmul ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ์ฟผ๋ฆฌ*ํ‚ค ๊ฐ’ ๊ตฌํ•˜๊ธฐ vector_probabilities = F.softmax(vector_scores, dim=-1) # ๊ฐ€์ค‘์น˜ ๊ตฌํ•˜๊ธฐ context_vectors = torch.matmul(encoder_state_vectors.transpose(-2, -1), vector_probabilities.unsqueeze(dim=2)).squeeze() # ์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ ๊ฐ’ ๊ตฌํ•˜๊ธฐ return context_vectors, vector_probabilities
ย 

์Šค์ผ€์ค„๋ง๋œ ์ƒ˜ํ”Œ๋ง Scheduled Sampling

ํ•™์Šต ๊ณผ์ •์—์„œ ์ฃผ์–ด์ง€๋Š” ๋ฐ์ดํ„ฐ์—์„œ๋Š” ํƒ€๊ฒŸ ์‹œํ€€์Šค๊ฐ€ ์ œ๊ณต๋˜๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด ๊ฐ ํƒ€์ž„์Šคํ…๋งˆ๋‹ค ์—ฐ์‚ฐ๊ณผ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์ง€๋งŒ, ์‹ค์ œ ๋ฐ์ดํ„ฐ ํ˜น์€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ๋ชจ๋ธ์ด ๋งŒ๋“œ๋Š” ์‹œํ€€์Šค๊ฐ€ ์–ด๋–ค์ง€ ์•Œ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋Ÿฌํ•œ ๊ณผ์ •์ด ์ž‘๋™ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํ•™์Šต ์‹œ์—๋Š” ํƒ€๊ฒŸ ์‹œํ€€์Šค๊ฐ€ ์žˆ์ง€๋งŒ, ํ…Œ์ŠคํŠธ์—์„œ๋Š” ํƒ€๊ฒŸ ์‹œํ€€์Šค๊ฐ€ ์—†์–ด ๊ฐ€์ค‘์น˜๊ฐ€ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ํ•™์Šต ๊ณผ์ •์—์„œ๋„ ์ผ๋ถ€ ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ๋ชจ๋ธ์—๊ฒŒ ๋งก๊ธฐ๋Š” ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•˜๊ฒŒ ๋งํ•˜๋ฉด, ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์€ ๋ฐ์ดํ„ฐ ์ค‘ ์ผ๋ถ€๋งŒ ๋ฝ‘์•„์„œ ํ™œ์šฉํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ํ•™์Šต ๊ณผ์ •์—์„œ ์ฃผ์–ด์ง„ ํƒ€๊ฒŸ ์‹œํ€€์Šค์™€ ๋ชจ๋ธ์ด ์ž์ฒด ์ƒ์„ฑํ•œ ์‹œํ€€์Šค๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ชจ๋ธ์ด ๊ฒฐ์ •ํ•˜๋Š” ์‹œํ€€์Šค์™€ ํ™•๋ฅ  ๋ถ„ํฌ๊ฐ€ ๊ฐœ์„ ๋˜๋„๋ก ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
ย 
์ด๋ฅผ ์œ„ํ•ด ์ƒ˜ํ”Œ๋ง ํ™•๋ฅ ์„ ๋จผ์ € ์„ค์ •ํ•ด๋‘ก๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ์ธ๋ฑ์Šค๋ฅผ ์‹œ์ž‘ ํ† ํฐ์ธ BEGIN ์ธ๋ฑ์Šค๋กœ ๋จผ์ € ์ง€์ •ํ•˜๊ณ  ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ, ์‹œํ€€์Šค ์ƒ์„ฑ๋ฌธ์„ ๋ฐ˜๋ณตํ•  ๋•Œ ๋žœ๋คํ•œ ๊ฐ’(๋‚œ์ˆ˜)๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๊ณ , ํ™•๋ฅ ๋ณด๋‹ค ์ž‘์œผ๋ฉด ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ํ™•๋ฅ ๋ณด๋‹ค ํฌ๋ฉด ์ฃผ์–ด์ง„ ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
ย 
์•„๋ž˜๋Š” ์Šค์ผ€์ค„๋ง๋œ ์ƒ˜ํ”Œ๋ง์„ ์•ž์„œ ์ •์˜ํ–ˆ๋˜ NMTDecoder์— forward ํ•จ์ˆ˜๋ฅผ ์ˆ˜์ •ํ•ด ์ž‘์„ฑํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. sample_probability ๊ฐ’์„ ์ง€์ •ํ•ด์ค„ ๋•Œ, 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์˜ˆ์ธก ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋˜, ๋ฐ˜๋ณต๋ฌธ๋งˆ๋‹ค use_sample = np.random.random() < sample_probability ๋ผ๋Š” ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ๋‚œ์ˆ˜ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ํ™•๋ฅ ์— ์˜๊ฑฐํ•ด ๋ชจ๋ธ ์˜ˆ์ธก ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•  ์ง€, ์•„๋‹ˆ๋ฉด ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•  ์ง€ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
ย 
class NMTDecoder(nn.Module): def __init__(self, num_embeddings, embedding_size, rnn_hidden_size, bos_index): super(NMTDecoder, self).__init__() # ์•ž์„œ ์ž‘์„ฑํ•œ ์ฝ”๋“œ์™€ ๋™์ผํ•œ ์ •์˜ # ์ƒ๋žต def forward(self, encoder_state, initial_hidden_state, target_sequence, sample_probability=0.0): """ ๋ชจ๋ธ์˜ ์ •๋ฐฉํ–ฅ ๊ณ„์‚ฐ ๋งค๊ฐœ๋ณ€์ˆ˜: encoder_state (torch.Tensor): NMTEncoder์˜ ์ถœ๋ ฅ initial_hidden_state (torch.Tensor): NMTEncoder์˜ ๋งˆ์ง€๋ง‰ ์€๋‹‰ ์ƒํƒœ target_sequence (torch.Tensor): ํƒ€๊นƒ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ํ…์„œ sample_probability (float): ์Šค์ผ€์ค„๋ง๋œ ์ƒ˜ํ”Œ๋ง ํŒŒ๋ผ๋ฏธํ„ฐ ๋””์ฝ”๋” ํƒ€์ž„ ์Šคํ…๋งˆ๋‹ค ๋ชจ๋ธ ์˜ˆ์ธก์— ์‚ฌ์šฉํ•  ํ™•๋ฅ  ๋ฐ˜ํ™˜๊ฐ’: output_vectors (torch.Tensor): ๊ฐ ํƒ€์ž„ ์Šคํ…์˜ ์˜ˆ์ธก ๋ฒกํ„ฐ """ # ์ƒ˜ํ”Œ ํ™•๋ฅ ์„ ์ง€์ •ํ•ด์ค๋‹ˆ๋‹ค # 0๊ณผ 1 ์‚ฌ์ด์˜ ์ ๋‹นํ•œ ๊ฐ’์„ ์ง€์ •ํ•ด์ค๋‹ˆ๋‹ค # 0 : ์˜ˆ์ธก ์‹œํ€€์Šค๋งŒ์„ ์‚ฌ์šฉ # 1 : ํƒ€๊ฒŸ ์‹œํ€€์Šค๋งŒ์„ ์‚ฌ์šฉ if target_sequence is None: sample_probability = 0.5 else: # ๊ฐ€์ •: ์ฒซ ๋ฒˆ์งธ ์ฐจ์›์€ ๋ฐฐ์น˜ ์ฐจ์›์ž…๋‹ˆ๋‹ค # ์ฆ‰ ์ž…๋ ฅ์€ (Batch, Seq) # ์‹œํ€€์Šค์— ๋Œ€ํ•ด ๋ฐ˜๋ณตํ•ด์•ผ ํ•˜๋ฏ€๋กœ (Seq, Batch)๋กœ ์ฐจ์›์„ ๋ฐ”๊ฟ‰๋‹ˆ๋‹ค target_sequence = target_sequence.permute(1, 0) output_sequence_size = target_sequence.size(0) # ์ฃผ์–ด์ง„ ์ธ์ฝ”๋”์˜ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์ดˆ๊ธฐ ์€๋‹‰ ์ƒํƒœ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค h_t = self.hidden_map(initial_hidden_state) batch_size = encoder_state.size(0) # ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค context_vectors = self._init_context_vectors(batch_size) # ์ฒซ ๋‹จ์–ด y_t๋ฅผ BOS๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค y_t_index = self._init_indices(batch_size) h_t = h_t.to(encoder_state.device) y_t_index = y_t_index.to(encoder_state.device) context_vectors = context_vectors.to(encoder_state.device) output_vectors = [] self._cached_p_attn = [] self._cached_ht = [] self._cached_decoder_state = encoder_state.cpu().detach().numpy() # ๋ฐ˜๋ณต๋ฌธ ์ด์ „๊นŒ์ง€๋Š” ๊ธฐ์กด์˜ Decoder ์ฝ”๋“œ์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค for i in range(output_sequence_size): # ์Šค์ผ€์ค„๋ง๋œ ์ƒ˜ํ”Œ๋ง ์‚ฌ์šฉ ์—ฌ๋ถ€ # ์ƒ์„ฑํ•œ ๋‚œ์ˆ˜์™€ ํ™•๋ฅ  ๊ฐ’์„ ๋น„๊ตํ•ด ์ƒ˜ํ”Œ๋ง ์‚ฌ์šฉ ์œ ๋ฌด๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค use_sample = np.random.random() < sample_probability if not use_sample: y_t_index = target_sequence[i] # ๋‹จ๊ณ„ 1: ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๊ณ  ์ด์ „ ๋ฌธ๋งฅ๊ณผ ์—ฐ๊ฒฐํ•ฉ๋‹ˆ๋‹ค y_input_vector = self.target_embedding(y_t_index) rnn_input = torch.cat([y_input_vector, context_vectors], dim=1) # ๋‹จ๊ณ„ 2: GRU๋ฅผ ์ ์šฉํ•˜๊ณ  ์ƒˆ๋กœ์šด ์€๋‹‰ ๋ฒกํ„ฐ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค h_t = self.gru_cell(rnn_input, h_t) self._cached_ht.append(h_t.cpu().detach().numpy()) # ๋‹จ๊ณ„ 3: ํ˜„์žฌ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์‚ฌ์šฉํ•ด ์ธ์ฝ”๋”์˜ ์ƒํƒœ๋ฅผ ์ฃผ๋ชฉํ•ฉ๋‹ˆ๋‹ค context_vectors, p_attn, _ = verbose_attention(encoder_state_vectors=encoder_state, query_vector=h_t) # ๋ถ€๊ฐ€ ์ž‘์—…: ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด ์–ดํ…์…˜ ํ™•๋ฅ ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค self._cached_p_attn.append(p_attn.cpu().detach().numpy()) # ๋‹จ๊ฒŒ 4: ํ˜„์žฌ ์€๋‹‰ ์ƒํƒœ์™€ ๋ฌธ๋งฅ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค prediction_vector = torch.cat((context_vectors, h_t), dim=1) score_for_y_t_index = self.classifier(F.dropout(prediction_vector, 0.3)) if use_sample: p_y_t_index = F.softmax(score_for_y_t_index * self._sampling_temperature, dim=1) # _, y_t_index = torch.max(p_y_t_index, 1) y_t_index = torch.multinomial(p_y_t_index, 1).squeeze() # ๋ถ€๊ฐ€ ์ž‘์—…: ์˜ˆ์ธก ์„ฑ๋Šฅ ์ ์ˆ˜๋ฅผ ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค output_vectors.append(score_for_y_t_index) output_vectors = torch.stack(output_vectors).permute(1, 0, 2) return output_vectors
ย 
์ด ๊ณผ์ •์„ ํ†ตํ•ด, ๋ชจ๋ธ์€ ์‹œํ€€์Šค๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜์—ฌ, ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ์‹œํ€€์Šค์™€ ๊ฐ€์ค‘์น˜๊ฐ€ ์ •๋‹ต์—์„œ ํฌ๊ฒŒ ๋ฒ—์–ด๋‚˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์ค„์—ฌ์ค๋‹ˆ๋‹ค.
ย 

๋ชจ๋ธ ํ›ˆ๋ จ

์ด๋ฒˆ ์žฅ์—์„œ ๋‹ค๋ฃฌ ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ๊ณผ์ •์€ ์•ž์„œ 6์žฅ๊ณผ 7์žฅ์—์„œ ๋‹ค๋ฃฌ ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ๊ณผ์ •๊ณผ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค.
  1. ์†Œ์Šค ์‹œํ€ธ์Šค์™€ ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ์ž…๋ ฅ๋ฐ›์•„, ํƒ€๊ฒŸ ์‹œํ€€์Šค ์˜ˆ์ธก ์ƒ์„ฑ
  1. ํƒ€๊ฒŸ ์‹œํ€€์Šค ์˜ˆ์ธก ๋ ˆ์ด๋ธ”์„ ํ†ตํ•ด ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ ์†์‹ค ๊ณ„์‚ฐ ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ ์†์‹ค์€ ํ™•๋ฅ  ๋ถ„ํฌ์™€ ์˜ˆ์ธก ๋ถ„ํฌ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ, ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ด ์˜ˆ์ธก์„ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋Š”์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค.
  1. ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐ
  1. ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ
ย 
์œ„ ๊ณผ์ •์œผ๋กœ ํ›ˆ๋ จํ•œ ๋ชจ๋ธ์— ๋Œ€ํ•ด, ์†Œ์Šค ๋ฌธ์žฅ๊ณผ ๋ชจ๋ธ์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๋ฌธ์žฅ ์Œ์— ๋Œ€ํ•ด BLEU ์ง€ํ‘œ๋กœ ํ‰๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์•ž์—์„œ 2๊ฐ€์ง€ ๋ชจ๋ธ์„ ์‚ดํŽด๋ดค์Šต๋‹ˆ๋‹ค. 1) ์ œ๊ณต๋œ ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋””์ฝ”๋”์— ์ž…๋ ฅํ•˜๋Š” ๋ชจ๋ธ. 2) ์Šค์ผ€์ค„๋ง๋œ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ์ž์ฒด ์˜ˆ์ธก์„ ๋งŒ๋“ค์–ด ๋””์ฝ”๋”์— ์ž…๋ ฅํ•˜๋Š” ๋ชจ๋ธ. ์ด ์ค‘์—์„œ 2๋ฒˆ์งธ ๋ชจ๋ธ, ์ž์ฒด ์˜ˆ์ธก์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์€ ์‹œํ€€์Šค ์˜ˆ์ธก์— ๋Œ€ํ•œ ์˜ค๋ฅ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๋„๋ก ํ•˜๋Š” ์žฅ์ ์„ ๊ฐ–๊ณ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ชจ๋ธ์˜ BLEU ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. 1๋ฒˆ ๋ชจ๋ธ๋ณด๋‹ค 2๋ฒˆ ๋ชจ๋ธ์—์„œ ์กฐ๊ธˆ ๋” ๋†’์€ ์ ์ˆ˜๋ฅผ ๋ณด์ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ
BLEU
์ œ๊ณต๋œ ํƒ€๊ฒŸ ์‹œํ€€์Šค๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ
46.8
์Šค์ผ€์ค„๋ง๋œ ์ƒ˜ํ”Œ๋ง์„ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ
48.1
ย 
์ด์ „ ๊ธ€ ์ฝ๊ธฐ
ย