Machine Translation À̶õ ¹«¾ùÀΰ¡?

 

±â°è¹ø¿ª (Machine translation, MT) ´Â Àΰ£ÀÇ °£¼·ÀÌ ¾øÀÌ ÄÄÇ»ÅÍ ÇÁ·Î±×·¥ÀÌ ÇϳªÀÇ ¾ð¾î (¿ø½Ã¹®¼­, source text) ·Î ±¸¼ºµÈ ¹®¼­¸¦ ºÐ¼®ÇÏ¿© ´Ù¸¥ ¾ð¾î (´ë»ó¹®¼­, target text) ·Î ±¸¼ºµÈ ¹®¼­·Î ¸¸µå´Â ¹ø¿ªÀÇ ÇÑ ÇüÅÂÀÌ´Ù. ±×·¯³ª ÃÖ±ÙÀÇ ±â°è¹ø¿ªÀÇ »óȲÀº »çÀüÆíÁý (pre-editing) °ú »çÈÄÆíÁý (post-editing) °ú °°Àº ¾î´ÀÁ¤µµÀÇ Àΰ£ÀÇ °£¼·À» Æ÷ÇÔÇÑ´Ù. ±â°è¹ø¿ª¿¡¼­´Â Àΰ£¹ø¿ªÀÚ°¡ ±â°è¸¦ Áö¿ø (support) ÇÑ´Ù´Â °ÍÀ» ÁÖ¸ñÇÏÀÚ (¹Ý´ë°¡ ¾Æ´Ï´Ù).

¿À´Ã³¯ ´ëºÎºÐÀÇ ±â°è¹ø¿ª ½Ã½ºÅÛÀº ¼ÒÀ§ "¿äÁ¡ ¹ø¿ª (gisting translation)" À̶ó°í ÇÏ¿© ¿ø½Ã¹®¼­ÀÇ ¿äÁ¡ (gist) À» Á¦°øÇÏ´Â °ÅÄ£ (rough) ¹ø¿ªÀ» ÇÏ¸ç ±× ¿ÜÀÇ °æ¿ì¿¡´Â »ç¿ëÇÒ¼ö ¾ø´Ù. ±×·¯³ª ´Ü¾î¿Í ¹®À屸Á¶°¡ °íµµ·Î Á¦ÇÑµÈ ¹üÀ§¿¡ ±¹ÇÑµÈ ºÐ¾ß, ¿¹¸¦µé¸é ±â»ó ¸®Æ÷Æ® (weather reports) °°Àº °æ¿ì¿¡, ±â°è¹ø¿ªÀº À¯¿ëÇÑ °á°ú¸¦ ³ºÀ»¼ö ÀÖ´Ù.

±â°è¹ø¿ª (machine translation) °ú ÄÄÇ»Åͺ¸Á¶ ¹ø¿ª (computer-assisted translation)

µÎ °³³äÀº À¯»çÇÏÁö¸¸, ±â°è¹ø¿ª (MT) ´Â ÄÄÇ»Åͺ¸Á¶ ¹ø¿ª (computer-assisted translation (CAT) ¶Ç´Â machine-assisted translation (MAT)) ¿Í È¥µ¿µÇ¾î¼­´Â ¾ÈµÈ´Ù. Áï ±â°è¹ø¿ª¿¡¼­´Â Àΰ£ ¹ø¿ªÀÚ°¡ ±â°è¸¦ Áö¿øÇÑ´Ù. Áï ÄÄÇ»ÅÍ ¶Ç´Â ÇÁ·Î±×·¥ÀÌ ¹®¼­¸¦ ¹ø¿ªÇϸé, Àΰ£¹ø¿ªÀÚ°¡ ÆíÁý (edit) ÇÑ´Ù. ¹Ý¸é¿¡ ÄÄÇ»Åͺ¸Á¶ ¹ø¿ªÀº ÄÄÇ»ÅÍ ÇÁ·Î±×·¥ÀÌ Àΰ£¹ø¿ªÀÚ¸¦ Áö¿øÇÑ´Ù. Áï Àΰ£ÀÌ ½º½º·Î ¹®¼­¸¦ ¹ø¿ªÇÏ°í ¸ðµç ±âº»ÀûÀÎ °áÁ¤À» ÇÑ´Ù.

¼Ò°³

¹ø¿ª°úÁ¤Àº ½º½º·Î ¹ø¿ªÇϵç Çؼ® (interpreting) ÇÏµç ´ÙÀ½°ú°°ÀÌ °£´ÜÈ÷ Ç¥ÇöÇÒ¼ö ÀÖ´Ù

  1. ¿ø½Ã¹®¼­ÀÇ Àǹ̸¦ Çص¶ (decoding) ÇÏ°í
  2. ´ë»ó¾ð¾î¿¡¼­ ÀÌ Àǹ̸¦ Àç¾Ïȣȭ (re-encoding) ÇÑ´Ù.

ÀÌ·¯ÇÑ Àΰ£ÀÇ °£´ÜÇÑ ¹ø¿ª°úÁ¤ À̸鿡´Â º¹ÀâÇÑ ÀÎÁö µ¿ÀÛÀÌ ¼û¾îÀÖ´Ù. ¿¹¸¦µé¸é ¿ø½Ã¹®¼­ÀÇ Àǹ̸¦ Çص¶Çϱâ À§ÇØ Àΰ£¹ø¿ªÀÚ´Â ±× ¹®¼­ÀÇ ¸ðµç Ư¡ (feature) À» Çؼ®ÇÏ°í ºÐ¼® (interpret and analyse) ÇؾßÇÑ´Ù. ÀÌ·¯ÇÑ °úÁ¤Àº È­ÀÚ (speaker) ÀÇ ¹®È­ (culture) »Ó¸¸ ¾Æ´Ï¶ó ¿ø½Ã¾ð¾îÀÇ ¹®¹ý (grammar), ÀÇ¹Ì (semantics), ±¸¹® (syntax), ¼÷¾î (idioms) µéÀ» ±íÀÌÀÖ°Ô ¾Ë ÇÊ¿ä°¡ ÀÖ´Ù. Àΰ£¹ø¿ªÀÚ´Â ´ë»ó¾ð¾î·Î ±× Àǹ̸¦ Àç¾Ïȣȭ ÇϱâÀ§ÇØ ¶È°°Àº ±íÀÌÀÇ Áö½ÄÀ» ÇÊ¿ä·Î ÇÑ´Ù.

°Å±â¿¡ ±â°è¹ø¿ªÀÇ µµÀüÀÌ ÀÖ´Ù : Àΰ£ÀÌ ÇϵíÀÌ ÄÄÇ»ÅÍ°¡ ¹®¼­¸¦ "ÀÌÇØ" Çϵµ·Ï ÇÏ·Á¸é  ¾î¶»°Ô ÇÁ·Î±×·¥ ÇÒ°ÍÀ̸ç, ¸¶Ä¡ Àΰ£ÀÌ ½á¿Ô´ø °Íó·³ "¹ßÀ½" µÇ´Â ¿ø½Ã¾ð¾î·Î »õ·Î¿î ¹®¼­¸¦ "âÁ¶" ÇÏ·Á¸é ¾î¶»°Ô ÇÁ·Î±×·¥ ÇÒ°ÍÀΰ¡?  ÀÌ·¯ÇÑ ¹®Á¦´Â ¸¹Àº ¹æ¹ýÀ¸·Î ½ÃµµµÉ¼ö ÀÖ´Ù.

¾ð¾îÇÐÀû Á¢±Ù¹æ½Ä

±â°è¹ø¿ªÀÇ ¼º°ø¿©ºÎ´Â ÀÚ¿¬¾îÀÌÇØ ¹®Á¦¸¦ ¸ÕÀú ÇØ°áÇØ¾ß ÇÑ´Ù´Â ÁÖÀåÀÌ ÀÖ´Ù. ±×·¯³ª ±â°è¹ø¿ªÀº ¸¹Àº ÈÞ¸®½ºÆ½ ¹æ¹ýµéÀÌ »ç¿ëµÇ´Âµ¥ ´ÙÀ½°ú °°Àº °ÍÀÌ´Ù.

´ë°³, ±ÔÄ¢±â¹Ý ¹æ¹ýµé (óÀ½ÀÇ ¼¼ °³) ´Â ¹®¼­¸¦ parse Çؼ­, º¸Åë intermediary ¸¦ »ý¼ºÇÏ°í, symbolic representation ·ÎºÎÅÍ ´ë»ó¾ð¾îÀÇ ¹®¼­¸¦ »ý¼ºÇÑ´Ù. ÀÌ·¯ÇÑ ¹æ¹ýµéÀº morphologic, syntactic, semantic information ¸¦ °¡Áø ±¤¹üÀ§ÇÑ lexicons ¿Í ¸¹Àº ±ÔÄ¢ (rules) À» ÇÊ¿ä·Î ÇÑ´Ù.

Statistical-based ¿Í example-based methods Àº manual lexicon building and rule-writing À» ÇÇÇÏ°í, ´ë½Å¿¡ Canadian Hansard corpus, English-French record of the Canadian parliament °°Àº bilingual text corpora ¿¡ ±âÃÊÇÑ ¹ø¿ªÀ» »ý¼ºÇÏ·Á°í ÇÑ´Ù. ±×·¯ÇÑ corpora °¡ »ç¿ëµÉ¶§ À¯»çÇÑ Á¾·ùÀÇ ¹®¼­¸¦ ¹ø¿ªÇÏ´Â ÀλóÀûÀÎ °á°ú¸¦ ¾òÀ»¼ö ÀÖÁö¸¸, ±×·¯ÇÑ corpora ´Â ¸Å¿ì Èñ±ÍÇÏ´Ù.

ÃæºÐÇÑ µ¥ÀÌŸ°¡ ÁÖ¾îÁø´Ù¸é, ´ëºÎºÐÀÇ ±â°è¹ø¿ª ÇÁ·Î±×·¥µéÀº ¾î¶² ¾ð¾îÀÇ native speaker °¡ ´Ù¸¥ native speaker °¡ ¾´ ¹®¼­ÀÇ ±ÙÁ¢ÇÑ Àǹ̸¦ ÃæºÐÈ÷ ¾òÀ»¼ö ÀÖÀ» Á¤µµ·Î ÀÛµ¿ÇÑ´Ù (Áï "¿äÁ¡¹ø¿ª" À» ÇÑ´Ù). ±×·¯³ª ƯÁ¤ ¹æ¹ýÀ» Áö¿øÇϱ⿡ ÃæºÐÇÑ µ¥ÀÌŸ¸¦ ¾ò´Â °ÍÀº ¾î·Á¿î ÀÏÀÌ´Ù. ¿¹¸¦µé¸é Åë°èÀû ¹æ¹ýÀ» À§ÇØ ÇÊ¿äÇÑ µ¥ÀÌŸÀÇ large multilingual corpus ´Â grammar-based methods ¿¡ ¹Ýµå½Ã ÇÊ¿äÇÑ °ÍÀº ¾Æ´Ï´Ù. ±×·¯³ª grammar method ´Â grammar ¸¦ ÁÖÀDZí°Ô ¼³°èÇÒ¼ö ÀÖ´Â ¼÷·ÃµÈ ¾ð¾îÇÐÀÚ°¡ ÇÊ¿äÇÏ´Ù.

»ç¿ëÀÚ

±â°è¹ø¿ªÀÇ ÇÑ°è¿¡µµ ºÒ±¸ÇÏ°í ¼¼°èÀÇ ¿©·¯ ±â°ü¿¡¼­ »ç¿ëµÇ°í ÀÖ´Ù. ¾Æ¸¶µµ °¡Àå Å« ±â°üÀº EU (European Commission) ÀÏ°ÍÀÌ¸ç ¸¹Àº ¾çÀÇ ¹®¼­¸¦ ÀÚµ¿¹ø¿ª ÇϱâÀ§ÇÑ »ó¾÷¿ë ÇÁ·Î±×·¥ÀÎ SYSTRAN ÀÌ °íµµ·Î ÃÖÀûÈ­µÇ¾î »ç¿ëµÇ°í ÀÖ´Ù.

Microsoft »ç¿¡¼­ 2003 ³â¿¡ hybrid ±â°è¹ø¿ª ½Ã½ºÅÛÀ» ¹ßÇ¥ÇÏ¿´´Âµ¥ ±×°ÍÀº English to Spanish ·Î ±â¼úÀÚ·á µ¥ÀÌŸº£À̽º¸¦ ¹ø¿ªÇϱâ À§ÇØ ³»ºÎÀûÀ¸·Î »ç¿ëµÇ¾ú´Ù. ±× ½Ã½ºÅÛÀº Microsoft ÀÇ Natural Language Research group ¿¡¼­ °³¹ßµÇ¾ú´Ù. ±× ±×·ì¿¡¼­´Â English–French, English–German, English–Japanese ¸¦ ¿Â¶óÀÎÀ¸·Î Á¦°øÇÏ´Â ½Ã½ºÅÛÀ» Å×½ºÆ® ÁßÀÌ´Ù. English–French ¿Í English–German Àº learned language generation component ¸¦ »ç¿ëÇÏ°í, English to Spanish ¿Í English–Japanese Àº manually developed generation components ¸¦ »ç¿ëÇÑ´Ù. ±× ½Ã½ºÅÛµéÀº °¢°¢ÀÌ ¹é¸¸ ¹®ÀåÀÌ ³Ñ´Â translation memory databases ¸¦ »ç¿ëÇÏ¿© °³¹ßµÇ°í ÈƷõǾú´Ù.

¿ª»ç

±â°è¹ø¿ªÀÇ ÃÖÃÊÀÇ ½Ãµµ´Â ¼¼°è Á¦ 2Â÷´ëÀü ÀÌÈÄ¿¡ ¼öÇàµÇ¾ú´Ù. ±×¶§¿¡´Â »õ·Î ¹ß¸íµÈ ÄÄÇ»ÅÍ°¡ ¹®¼­¸¦ ¹ø¿ªÇϴµ¥ º° ¾î·Á¿òÀÌ ¾øÀ» °ÍÀ̶ó°í ¿¹»óÇÏ¿´´Ù. ±×°ÍÀº ÄÄÇ»ÅÍ°¡ º¹ÀâÇÑ ¼öÇÐÀ» ºü¸£°Ô ¼öÇàÇÏ°í »ç¶÷µéÀÌ ¾î·Á¿öÇÏ´Â °è»êÀ» ÇÒ¼ö ÀÖ¾ú±â ¶§¹®¿¡ ±×·±°Í °°´Ù. ÇÑÆíÀ¸·Î ¾î¸° ¾ÆÀ̵éÀÌ ¾ð¾î¸¦ ½±°Ô ÀÌÇØÇÏ°í ÇнÀÇϹǷΠÄÄÇ»Å͵µ ±×·²°ÍÀ̶ó »ý°¢Çß´Ù. ½ÇÁ¦·Î ±× ¹ÏÀ½Àº ºÎÁ¤È®Çß´Ù´Â °ÍÀÌ °ð µå·¯³µ´Ù.

1954 ³â¿¡ ±â°è¹ø¿ª ½Ã½ºÅÛÀÇ ÃÖÃÊÀÇ °ø½ÄÀûÀÎ ½Ã¿¬ÀÌ IBM ÀÇ ÁÖ°üÀ¸·Î ´º¿å¿¡¼­ °³ÃֵǾú´Ù. ±×°ÍÀº ½Å¹®¿¡ Å©°Ô º¸µµµÇ¾ú°í ´ëÁßÀÇ °ü½ÉÀ» ²ø¾ú´Ù. ±×·¯³ª ±× ½Ã½ºÅÛ ÀÚü´Â ¿À´Ã³¯ÀÇ Àå³­°¨ ¼öÁØ¿¡ ¸Ó¹°·¶´ø °ÍÀ¸·Î ´ÜÁö 250 °³ÀÇ ´Ü¾î¸¦ °¡Áö°í 49 °³ÀÇ ÁÖÀDZí°Ô ¾ö¼±µÈ ·¯½Ã¾Æ È­ÇÐºÐ¾ß ¹®ÀåÀ» ¿µ¾î·Î ¹ø¿ªÇÏ¿´´Ù. ±×·³¿¡µµ ºÒ±¸ÇÏ°í ±×°ÍÀº ±â°è¹ø¿ªÀÌ ¸Õ À̾߱Ⱑ ¾Æ´Ï¶ó´Â °ÍÀ» º¸¿©ÁÖ¾ú°í, ƯÈ÷ Àü ¼¼°èÀûÀ¸·Î ±â°è¹ø¿ª¿¡ ¸¹Àº ÀçÁ¤ÅõÀÚ¸¦ ÇϰԵǴ °è±â°¡ µÇ¾ú´Ù.

ÃÖÃÊÀÇ Áß¿äÇÑ ±â°è¹ø¿ª ½Ã½ºÅÛµéÀº ·¯½Ã¾Æ °úÇÐÀú³ÎÀÇ ¹®¼­¸¦ parse ÇϱâÀ§ÇØ ³ÃÀü (cold war) ±â°£¿¡ »ç¿ëµÇ¾ú´Ù. ºñ·Ï °ÅÄ£ ¹ø¿ªÀ̾úÁö¸¸ ±â»çµéÀÇ "¿äÁ¡" À» ÀÌÇØÇϱ⿡´Â ÃæºÐÇÏ¿´´Ù. ¿¹µéµé¾î ¾î¶² ±â»ç°¡ º¸¾È¹®Á¦¿¡ ´ëÇÑ ³»¿ëÀ̾ú´Ù¸é ±×°ÍÀº ¿ÏÀüÇÑ ¹ø¿ªÀ» À§ÇØ Àΰ£ ¹ø¿ª°¡¿¡°Ô º¸³»Á³°í, ¾Æ´Ï¶ó¸é Æó±âµÇ¾ú´Ù.

20 ¼¼±â ¸»¿±ÀÇ Àú°¡ÀÇ °í¼º´É ÄÄÇ»ÅÍÀÇ ÃâÇö°ú ÀÎÅÍ³Ý »çÀÌÆ®¿¡¼­ ÀÌ¿ëµÊ¿¡ µû¶ó ±â°è¹ø¿ªÀº ´ëÁߵ鿡°Ô °¡±õ°Ô µÇ¾ú´Ù.

±×·¯³ª ÀÌÀüÀÇ ±â°è¹ø¿ª ¿¬±¸ÀÇ ´ëºÎºÐÀº translation memories °°Àº ÄÄÇ»Åͺ¸Á¶ ¹ø¿ª (computer-assisted translation) ½Ã½ºÅÛÀÇ °³¹ß¿¡ ÆíÇâµÇ¾î ¿ÔÀ¸¸ç ½ÇÁ¦·Î ´õ ¼º°øÀûÀ̾ú´Ù.

¿¹

´ÙÀ½Àº À§ÀÇ ¹®ÀåÀ» SYSTRAN À¸·Î ±â°è¹ø¿ªÇÑ °ÍÀÌ´Ù.

Korean

°ü·Ã ¿ë¾î

Free (epen source) software

external links

 ................................................................................................... (Wikipedia : Machine translation)