2-6 ์ฒญํฌ ๋‚˜๋ˆ„๊ธฐ์™€ ๊ฐœ์ฒด๋ช… ์ธ์‹

ย 
๊ธ€์„ ์ฝ๊ฑฐ๋‚˜ ์“ธ ๋•Œ ๋‹จ์ˆœํ•œ ๋ฌธ์žฅ์—์„œ๋Š” ํ•œ ๋‹จ์–ด๊ฐ€ ํ•œ ๋ฌธ๋ฒ•์  ์—ญํ• ์„ ๋‹ด๋‹นํ•˜์ง€๋งŒ, ๋ฌธ์žฅ์ด ์กฐ๊ธˆ์ด๋ผ๋„ ๋ณต์žกํ•ด์ง€๋ฉด ์—ฌ๋Ÿฌ ๋‹จ์–ด๊ฐ€ ๊ตฌ๋ฅผ ์ด๋ฃจ์–ด ํ•œ ๊ฐ€์ง€ ์—ญํ• ์„ ๊ฐ–์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€˜๋‚˜๋Š” ์˜ค๋Š˜ ๋ง›์žˆ๋Š” ๋ฐฅ์„ ๋จน์—ˆ์–ดโ€™ ๋ผ๋Š” ๋ฌธ์žฅ์—์„œ ๋ชฉ์ ์–ด๋Š” 2๊ฐœ์˜ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ๋œ ๋‹จ์–ด๊ตฌ, โ€˜๋ง›์žˆ๋Š” ๋ฐฅโ€™ ์ž…๋‹ˆ๋‹ค. NLP ์ž‘์—… ์†์—์„œ๋„ ๋™์ผํ•˜๊ฒŒ ๋ฌธ์žฅ์„ ์ฝ๊ณ  ์“ธ ๋•Œ, ์—ฌ๋Ÿฌ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ๋œ ๊ตฌ๋ฌธ์„ ์ดํ•ดํ•˜๊ณ  ํ™œ์šฉํ•  ์ค„ ์•Œ์•„์•ผํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ, ์—ฌ๋Ÿฌ ๋‹จ์–ด๋ฅผ ํ•˜๋‚˜์˜ ๊ตฌ๋กœ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์„ ๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„(shallow parsing) ๋˜๋Š” ์ฒญํฌ ๋‚˜๋ˆ„๊ธฐ(chunking)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
ย 

๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„

๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ˜•ํƒœ์†Œ ํ˜น์€ ๋‹จ์–ด๊ฐ€ ๊ตฌ๋ฌธ์ ์œผ๋กœ ํ•˜๋‚˜์˜ ๊ตฌ์กฐ์— ํฌํ•จ๋œ ๊ฒฝ์šฐ, ๋‹จ์–ด๋“ค์„ ํ•˜๋‚˜์˜ ๋‹จ์œ„๋กœ ๋ฌถ์–ด์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ ๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„์˜ ๋ชฉ์ ์€ ๋ช…์‚ฌ, ๋™์‚ฌ, ํ˜•์šฉ์‚ฌ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๋ฌธ๋ฒ• ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋œ ๊ณ ์ฐจ์›์˜ ๋‹จ์œ„๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์œผ๋กœ, ๋” ๋†’์€ ์ˆ˜์ค€์˜ ๋ชจ๋“ˆ์—์„œ ์‰ฝ๊ฒŒ ๊ตฌ๋ฌธ์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๋ช…์‚ฌ๊ตฌ, ๋™์‚ฌ๊ตฌ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๊ตฌ๋ฌธ์„ ๋ถ„์„ํ•ด๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
์˜์–ด์ฒ˜๋Ÿผ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ์–ธ์–ด์— ๋Œ€ํ•ด์„œ๋Š” ๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„์„ ์ž˜ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. spaCy ํŒจํ‚ค์ง€๋ฅผ ํ†ตํ•ด ๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์˜ˆ์‹œ ์ฝ”๋“œ์—์„œ๋Š” โ€˜Do not count the eggs before they hatchโ€™ ๋ผ๋Š” ๋ฌธ์žฅ์— ๋Œ€ํ•ด ๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„์„ ํ†ตํ•ด ๋ช…์‚ฌ๊ตฌ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
import spacy nlp = spacy.load('en') doc = nlp(u"Do not count the eggs before they hatch") for chunk in doc.noun_chunks: print('{} - {}'.format(chunk, chunk.label_))
ย 
์œ„ ๋ฌธ์žฅ์—์„œ ๋‹จ์ˆœํžˆ ๋ช…์‚ฌ ๋‹จ์œ„๋กœ eggs, they ๋งŒ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ตฌ๋ฌธ์ ์œผ๋กœ ํ•จ๊ป˜ ๋ชฉ์ ์–ด๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” the eggs๋ฅผ ๊ตฌ ๋‹จ์œ„๋กœ ์ž˜ ์ถ”์ถœํ•ด๋‚ด๋Š” ๊ฒƒ์„ ์•„๋ž˜ ์ถœ๋ ฅ ๊ฒฐ๊ณผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
[output] the eggs - NP they - NP
ย 

๊ฐœ์ฒด๋ช…

๋˜ ๋‹ค๋ฅธ ์œ ์šฉํ•œ ๋ถ€๋ถ„ ๊ตฌ๋ฌธ ๋ถ„์„ ๋‹จ์œ„๋Š” ๊ฐœ์ฒด๋ช…(named entity)์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•ž์„œ
2-5 ๋‹จ์–ด ๋ถ„๋ฅ˜ํ•˜๊ธฐ: ํ’ˆ์‚ฌ ํƒœ๊น…
์—์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ ์ฒ˜๋Ÿผ โ€˜Coca Colaโ€™๋‚˜ โ€˜Deep Daivโ€™์™€ ๊ฐ™์ด ๋ฌธ๋ฒ•์ ์œผ๋กœ๋Š” ํ•จ๊ป˜ ์žˆ๋Š”๊ฒŒ ์–ด์ƒ‰ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ๊ณ ์œ ๋ช…์‚ฌ๋กœ์จ ํŠน์ •ํ•œ ๊ฐœ์ฒด๋ฅผ ์ง€์นญํ•  ๋•Œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์–ด๊ตฌ๋ฅผ ๋ฌธ์žฅ ์† ๋ช…์‚ฌ๊ตฌ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋‹จ์œ„๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
John was born in Chicken, Alaska and studied at Cranberry Lemon University -> John : ์‚ฌ๋žŒ ์ด๋ฆ„ -> Chicken : ์ง€๋ช… -> Alaska : ์ง€๋ช… -> Cranberry Lemon University : ํ•™๊ต๋ช…
ย 
ย 
๋‹ค์Œ ๊ธ€ ์ฝ๊ธฐ
ย