Nonparametric Decision Making
Pattern Recognition and Image Analysis : Earl Gose. Richard Johnsonbaugh. Steve Jost Àú¼, Prentice Hall, 1996, Page 149~193
4.3 Kernel and Window Estimators
4.4 Nearest Neighbor Classification Techniques
The Single Nearest Neighbor Technique
A Bound on the Nearest Neighbor Error Rate
A Lower Bound on the Bayesian Error Rate from Nearest Neighbor Results
The k-nearest Neighbor Technique
Other Nearest Neighbor Techniques
4.5 Adaptive Decision Boundaries
4.6 Adaptive Discriminant Functions
½Ç¼¼°èÀÇ ´ëºÎºÐÀÇ ¹®Á¦¿¡¼ °ü½É»ç°¡ ¾î¶² typeÀÇ density functionÀ» °¡Áö´ÂÁö¸¦ ¸ð¸£´Â °æ¿ì°¡ ¸¹´Ù.ÀÌ·¯ÇÑ °æ¿ì¿¡´Â ÀÏ·ÃÀÇ sampleµé¿¡ ´ëÇÑ ÀÓÀÇÀÇ density¸¦ »ç¿ëÇÏ¿© ¸ÂÃß´Â ¹æ¹ýÀ» »ç¿ëÇϴµ¥ À̰ÍÀ» Nonparametric Decision Making À̶ó°í ÇÑ´Ù. À̰ÍÀº densityÀÇ ÀϹÝÀûÀÎ ÇüŸ¦ ÃßÃøÇÒ ¸¸ÇÑ ÃæºÐÇÑ ±Ù°Å¸¦ °®°í ÀÖÁö ¸øÇÒ ¶§ »ç¿ëÇÑ´Ù.
sample data·ÎºÎÅÍ Á¤È®ÇÑ parameter¸¦ ±¸ÇÏ¿© parametric decision functionÀ¸·ÎºÎÅÍ °¢ classµéÀÇ ºÐÆ÷¸¦ ÆÄ¾ÇÇÏ´Â °ÍÀÌ ¾Æ´Ï¶ó, sample·ÎºÎÅÍ È®·üÀû parameter¸¦ ±¸ÇÏÁö ¾Ê°í °¢ classµéÀÇ ºÐÆ÷ typeÀ» ¸ð¸£´Â »óÅ¿¡¼ °ð¹Ù·Î classificationÀ» ¼öÇàÇÑ´Ù.
ºÐÆ÷ typeÀ» ¸ð¸¦ ¶§ ±Ù»çÀûÀÎ parametric decision functionÀ» »ç¿ëÇÏ´Â °ÍÀÌ histogram °ú kernal À̸ç À̿ʹ ÀüÇô ´Ù¸¥ ¹æ¹ýÀÌ nearist neighbour classification ¹æ¹ýÀÌ´Ù.
µµ¼öºÐÆ÷¸¦ ³ªÅ¸³»´Â ±×·¡ÇÁ¸¦ ¸»ÇÏ¸ç ±âµÕ±×·¡ÇÁ ·±âµÕ¸ð¾ç ±×¸² µîÀ̶ó°íµµ ÇÑ´Ù. °üÃøÇÑ µ¥ÀÌÅÍÀÇ ºÐÆ÷ÀÇ Æ¯Â¡ÀÌ ÇÑ´«¿¡ º¸À̵µ·Ï ±âµÕ ¸ð¾çÀ¸·Î ³ªÅ¸³½ °ÍÀÌ´Ù. °¡·ÎÃà¿¡ °¢ °è±ÞÀÇ °è±Þ°£°ÝÀ» ³ªÅ¸³»´Â Á¡À» Ç¥½ÃÇϰí, ÀÌµé °è±Þ°£°Ý¿¡ ´ëÇÑ ±¸°£ À§¿¡ ÀÌ °è±ÞÀÇ µµ¼ö¿¡ ºñ·ÊÇÏ´Â ³ôÀÌÀÇ ±âµÕÀ» ¼¼¿î´Ù. ..........
°¢ class ¿¡ ´ëÇÑ probability density function
()À» ¸ð¸¦ ¶§ ±Ù»çÀûÀÎ
¸¦ ±¸ÇÏ´Â ¹æ¹ýÀ¸·Î¼ º¯¼ö
ÀÇ ¹üÀ§¸¦ ¸ðµç data¸¦ Æ÷ÇÔÇÏ´Â À¯ÇѰ¹¼öÀÇ interval ·Î ³ª´©¾î¼ ¸·´ë±×·¡ÇÁ·Î
Ç¥½ÃÇÑ °ÍÀÌ´Ù. À̶§ ÀÇ °¢ intervalÀ» cell ¶Ç´Â bin À̶ó°í ºÎ¸¥´Ù. °¢ histogramÀ»
¿¬°áÇÏ¿© density function ÀÇ ÃßÁ¤Ä¡·Î¼ »ç¿ëÇÏ·Á¸é histogram ÇÏÀÇ ÇÕ°è
¸éÀûÀÌ 1 À̾î¾ß ÇÑ´Ù. °¢ intervalÀ»
¶ó Çϰí Àüü sample ÀÇ ¼ö¸¦
À̶ó ÇÏ¸é °¢ bin ÀÇ ¸éÀûÀº
ÀÌ°í ±× interval ÀÇ Æø¿¡ µû¶ó ³ª´µ¾îÁ® density ÀÇ ³ôÀÌ´Â
ÀÌ µÈ´Ù. ÀÏ´Ü ±Ù»çÄ¡ density function ÀÌ ±¸ÇØÁö¸é Bayes'
Á¤¸®¸¦ »ç¿ëÇÏ¿© decisionÀ» ÇϰԵȴÙ.
º¯¼ö °¡ ¿¬¼ÓÀ̵ç ÀÌ»êÀÌµç ±× ¹üÀ§¸¦ interval·Î ³ª´©°í °°Àº ¹æ¹ýÀ» »ç¿ëÇÏ°Ô µÈ´Ù.
°¢
ÀÇ °ªÀ» °¡Áö´Â sample ÀÇ ºÎºÐÀÌ ºÐÆ÷
ÀÇ ÃßÁ¤Ä¡
·Î »ç¿ëµÇ°í ±× ÇÕÀº 1 ÀÌ µÉ °ÍÀÌ´Ù.
bin ÀÇ Å©±â¸¦ ÀûÀýÇÏ°Ô ÇÏ´Â °ÍÀÌ Áß¿äÇÏ¸ç ³Ê¹« Å©¸é ±×·¡ÇÁ°¡ ³Ê¹« rough ÇÏ°Ô µÇ¸ç ÀÛÀ¸¸é ³Ê¹« º¯È°¡ ½ÉÇÏ°Ô µÈ´Ù. ÀÌ·¯ÇÑ histogramÀ» Àß ¸¸µé¸é ¿ø·¡ÀÇ probability density functionÀ» ±¸ÇØ ³¾ ¼öµµ ÀÖ´Ù.
(a) 50 °³ÀÇ ÀÓÀÇÀÇ ¼öµé·Î ÀÌ·ç¾îÁø true normal density (b) 6 °³ÀÇ intervalÀ» °¡Áø 50 °³ÀÇ normally distributed ÀÓÀÇÀÇ ¼öµé¿¡ ´ëÇÑ histogram (c) 3 °³ÀÇ interval ÀÇ °æ¿ì (d) 24 °³ÀÇ interval ÀÇ °æ¿ì.
Example 4.1 ´Ù¸¥ Å©±âÀÇ intervalÀ» °¡Áø density histogram ÀÇ ±¸Ãà
grapefruit volumesÀ» À§ÇÑ density
function ÀÇ histogram ±Ù»çÄ¡¸¦ ±¸Çغ¸ÀÚ. 100 °³ÀÇ grapefruit À» volume ¿¡ µû¶ó¼ ´ÙÀ½°ú °°Àº data °¡ ÃøÁ¤µÇ¾ú´Ù.
|
interval ±æÀÌ |
sample ÀÇ ¼ö |
sample µéÀÇ ºñÀ² (=area) |
rectangle
|
[0, 4) |
4 |
10 |
0.1 |
0.025 |
°¢ rectangle ÀÇ ³ôÀÌ´Â interval ±æÀÌ·Î ³ª´©¾îÁø sample ÀÇ ºñÀ²°ú °°´Ù. ¿¹¸¦µé¸é 0 ¿¡¼ 4 ±îÁöÀÇ interval ÀÇ °æ¿ì rectangle ÀÇ ³ôÀÌ´Â 0.1/(4-0)=0.025 ÀÌ´Ù. °á°úÀûÀÎ density ´Â ±×¸²°ú °°´Ù. À̰ÍÀº density histogram À̱⠶§¹®¿¡ rectangle ¸éÀûÀÇ ÇÕÀº 1 À̾î¾ß ÇÑ´Ù.
rectangle ÀÇ ³ôÀÌ´Â ´Ü¼øÇÑ ¿¹ÃøÄ¡À̱â
¶§¹®¿¡ ¿Í °°ÀÌ Ç¥ÇöÇÑ´Ù. sample ÀÌ ¸¹Àº ±¸°£¿¡¼´Â bin ÀÇ Å©±â°¡ ÀÛ°í, ÀûÀº ±¸°£¿¡¼´Â
bin ÀÇ Å©±â°¡ Å©´Ù´Â °ÍÀ» ¾Ë ¼ö ÀÖ´Ù.
Example 4.2 histogram °ú Bayes' Á¤¸®¸¦ »ç¿ëÇÑ classification
´ÙÀ½ data¸¦ »ç¿ëÇÏ¿© ÀÏ ¶§,
ÀÇ sample ÀÌ class
¿¡ ¼ÓÇÏ´ÂÁö
¿¡ ¼ÓÇÏ´ÂÁö¸¦ ºÐ·ùÇ϶ó.
´ÙÀ½ data ´Â class ¿¡¼ ¼±ÅÃµÈ 60 °³ÀÇ ÀÓÀÇÀÇ sample ÀÇ feature
ÀÇ °ªÀÌ´Ù.
0.80 |
0.91 |
0.93 |
0.95 |
1.32 |
1.53 |
1.57 |
1.63 |
1.67 |
1.74 |
´ÙÀ½ data ´Â class ¿¡¼ ¼±ÅÃµÈ 60 °³ÀÇ ÀÓÀÇÀÇ sample ÀÇ feature
ÀÇ °ªÀÌ´Ù.
3.54 |
3.88 |
4.24 |
4.30 |
4.30 |
4.70 |
4.78 |
4.97 |
5.21 |
5.42 |
(a) class
A ¸¦ À§ÇÑ feature ÀÇ histogram (b) class B
À§ÀÇ ±×¸²Àº class ¿Í
¸¦ À§ÇÑ
ÀÇ °¢ interval¿¡¼ sampleµéÀÇ histogram ÀÌ´Ù.À̰ÍÀ» density function À¸·Î
º¯È¯Çϱâ À§Çؼ, ÀÌ·¯ÇÑ ¼öµéÀÌ sample µéÀÇ ÀüüÀÇ ¼ö(60) ¿Í interval width (1)
·Î ³ª´©¾îÁ®¾ß ÇÑ´Ù.
ÀÎ sampleÀ» ºÐ·ùÇϱâ À§ÇØ, 7.5 ¿¡¼ µÎ °³ÀÇ histogram ÀÇ ³ôÀ̸¦ ºñ±³ÇÑ´Ù.
¿Ö³ÄÇϸé 7.5¸¦ Æ÷ÇÔÇÏ´Â class interval Àº classes
¿Í
¿¡ ´ëÇÏ¿© [7, 8] ±¸°£À̱⠶§¹®ÀÌ´Ù.
À̸ç
ÀÌ´Ù. ¿©±â¼ Bayes' Á¤¸®¸¦ »ç¿ëÇÏ¿©
¶ÇÇÑ ÀÌ´Ù. ±×·¯¹Ç·Î
ÀÌ´Ù.
µû¶ó¼ ±× sample Àº class ·Î ºÐ·ùµÇ¾î¾ß ÇÑ´Ù.
and
...........
¿¹¸¦ µé¸é spike ÁýÇÕÀ̳ª delta ÇÔ¼öÀÇ °æ¿ì¿Í °°ÀÌ °¢ sample °ªÀÌ ¸Å¿ì Á¼Àº Æø°ú Å« ³ôÀ̸¦ °¡Áö´Â °æ¿ì °¢ spike ¸éÀûÀÇ ÇÕÀÌ 1 ÀÌ µÇµµ·Ï ÇÏ¿© true density functionÀÇ ¸Å¿ì °ÅÄ£ ±Ù»çÄ¡¸¦ ±¸ÇÒ ¼ö ÀÖ´Ù. °¢ spike ÀÇ ¸éÀûÀº Àüü sampleÀÇ ¼ö¿¡ ÀÇÇØ ³ª´©¾îÁø ÇØ´ç Æ÷ÀÎÆ®ÀÇ sample ÀÇ ¼öÀÌ´Ù.
¿¡ ³õ¿©ÀÖ´Â sample µéÀÇ density¸¦ ÃßÁ¤ÇÏ´Â delta ÇÔ¼ö ÀÇ ¿¹ (±×¸² 1)¸¦ µé¾îº¸ÀÚ.
ÀÌ·¯ÇÑ continuous density function ÀÇ ±Ù»çÄ¡´Â ÀÇ»ç °áÁ¤¿¡¼´Â À¯¿ëÇÏ°Ô »ç¿ëµÉ
¼ö ¾ø´Â °ÍÀÌ´Ù. ±×·¯³ª ±× delta ÇÔ¼ö°¡ »ç°¢À̳ª »ï°¢ ¶Ç´Â
Á¤±Ô¹ÐµµÇÔ¼ö ¿Í °°Àº kernel À̶ó°í ºÒ¸®´Â ´Ù¸¥ ÇÔ¼ö·Î ¹Ù²Ù¾î Áø´Ù¸é ±×µéÀÇ ÇÕ°è
¸éÀûÀÌ 1 ÀÌ µÇµµ·Ï ÇÏ´Â ´õ ºÎµå·´°í ¸¸Á·½º·± ÃßÁ¤À» ÇÒ ¼ö°¡ ÀÖ´Ù(smoothening).
Example 4.3 »ï°¢ kernel ÀÇ »ç¿ë
ÀÇ sampleÀ» °¡Áö´Â °æ¿ìÀÇ »ï°¢ kernelÀ» »ç¿ëÇÑ ÃßÁ¤ density functionÀ» (±×¸²
2) ¿¡¼ º¼ ¼ö ÀÖ°í, »ï°¢ »ç°¢ Á¤±Ô ºÐÆ÷ kernelÀ» »ç¿ëÇÏ¿© Ç¥ÁØÆíÂ÷ 1À» °¡Áö°Ô
ÇÏ´Â °æ¿ì´Â (±×¸² 3) ¿¡¼ º¼ ¼ö ÀÖ´Ù. .....
±×¸²1 : 3 °³ÀÇ delta functions
¿¡ ÀÇÇØ ±¸ÇØÁø ±Ù»çÄ¡ density. °¢ bar ÀÇ ³ôÀÌ¿Í ÆøÀº ,
ÀÌ´Ù.
±×¸² 2 : (a) kernel ¹æ¹ýÀ» »ç¿ëÇØ¼
ÃßÁ¤ È®·ü ºÐÆ÷ÇÔ¼ö ¸¦ °è»êÇÏ´Â °ÍÀ¸·Î kernels (Á¡¼±) °ú ±× ÇÕ°è (½Ç¼±) ¸¦ º¼ ¼ö ÀÖ´Ù. ½Ç¼±
¸éÀûÀÇ ÇÕÀº 1 ÀÌ´Ù.
(b) window ¹æ¹ýÀ» »ç¿ëÇØ¼
¸¦ °è»êÇϸé
¿Í °°´Ù.
±×¸² 3 : 3°³ÀÇ kernel functions, rectangular (½Ç¼±), triangular (Á¡¼±), normal (dashed). °¢ ¿µ¿ªÀÇ ÇÕÀº 1 À̰í Ç¥ÁØÆíÂ÷µµ 1 ÀÌ´Ù.
¿¡ ÀÖ´Â sampleÀ» ºÐ·ùÇϱâ À§Çؼ´Â Àüü ¹Ðµµ
°¡ ÇÊ¿äÇÏÁö´Â ¾Ê°í
¿¡¼ÀÇ °ª¸¸ ÇÊ¿äÇÏ´Ù. (±×¸² 2, b) ÀÇ °æ¿ìó·³
¿¡¼
¸¦ °¡Á¤ÇØ º¸ÀÚ. °á°ú´Â ¹°·Ð
¿¡¼ ¸ðµç kernel ÇÔ¼öÀÇ ³ôÀÌÀÇ ÇÕ°ú °°À» °ÍÀÌ´Ù. ÀÌ·± °á°ú¸¦ ¾ò´Â ¶Ç ´Ù¸¥
¹æ¹ýÀº
ÀÇ Áß¾Ó ±Ùó¿¡¼ kernel ÇÔ¼ö¸¦ ¹Ý¿µÇÏ´Â window ÇÔ¼ö¸¦ ¸¸µå´Â °ÍÀÌ´Ù. ±×¸®°í
ÀÌ window ¿¡ Æ÷ÇÔµÈ °¢ sample point ÀÇ ³ôÀ̸¦ ÇÕÇÏ´Â °ÍÀÌ´Ù. ´ëĪÀûÀÎ
kernel ÇÔ¼ö°¡ º¸Åë »ç¿ëµÈ´Ù. (±×¸² 2, b)¿¡¼ window ÇÔ¼ö´Â sample µÎ °³¸¦ Æ÷ÇÔÇϰí
ÀÖ°í °¢°¢ÀÇ ³ôÀÌ´Â 2/27 À̸ç
ÀÇ ÃßÁ¤Ä¡
´Â 4/27 ÀÌ´Ù. À̰ÍÀº kernel ¹æ¹ýÀ» »ç¿ëÇÑ (±×¸² 2, a) ¿Í °°Àº °ªÀÌ´Ù.
histogram ÀÇ °æ¿ì¿Í ¸¶Âù°¡Áö·Î kernel À̳ª window ¹æ¹ýÀ» »ç¿ëÇÏ´Â °æ¿ì¿¡µµ ÀûÀýÇÑ width ³ª Ç¥ÁØÆíÂ÷¸¦ ¼±ÅÃÇÏ´Â ¹®Á¦´Â Áß¿äÇÏ´Ù. width °¡ ³Ê¹« Å©¸é ¹Ì¼¼ÇÑ ±¸Á¶¸¦ ÀÒÀ» ¼ö°¡ ÀÖ°í ³Ê¹« ÀÛÀ¸¸é ÃÖÁ¾ ±Ù»ç°ªÀÌ ÃæºÐÈ÷ smoothen µÇÁö ¾ÊÀ» °ÍÀÌ´Ù.
±×¸² 4 : ÃßÁ¤ density functions
(a)
»ï°¢(triangular) kernel (b) normal kernel. normal kernel
ÀÌ »ï°¢ kernelº¸´Ù ´õ ºÎµå·´´Ù.
ºÐ·ùÇϰíÀÚ ÇÏ´Â class ÀÇ Á¾·ù¿¡ ´ëÇØ¼´Â ¾Ë°í ÀÖÁö¸¸ sample µé °¢°¢¿¡ ´ëÇÑ probability density functionÀ» ¾ËÁö ¸øÇÏ´Â »óÅ¿¡¼ »ç¿ëÇÑ´Ù. ±»ÀÌ °¢ sample ¿¡ ´ëÇÑ È®·üÀû parameterµéÀ» ±¸ÇÏÁö ¾Ê°í sampleÀÇ °ªÀ» ±×´ë·Î ÁÂÇ¥¿¡ Ç¥½ÃÇÏ¿© reference set¿¡¼ °¡Àå À¯»ç(similar)Çϰųª °Å¸® »óÀ¸·Î °¡±î¿î (nearest) class ¿¡ ¼ÓÇÏ´Â °ÍÀ¸·Î ºÐ·ùÇÏ´Â ¹æ¹ýÀÌ´Ù.
nearest ÀÇ Àǹ̴ ¹«¾ùÀΰ¡? ±×°ÍÀº smallest Euclidean distance, absolute difference, maximum distance, Minkowski distance µîÀ¸·Î ³ª´ ¼ö ÀÖ´Ù.
(1) Euclidean distance
-Â÷¿øÀÇ feature space ·Î ±¸¼ºµÈ °æ¿ì¶ó ÇÏ¸é µÎ °³ÀÇ Æ÷ÀÎÆ®
¿Í
»çÀÌÀÇ ±âÇÏÇÐÀû °Å¸®´Â ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁú ¼ö ÀÖ´Ù. À̰ÍÀº ÇÇŸ°í¶ó½º Á¤¸®¸¦
-Â÷¿øÀ¸·Î È®ÀåÇÑ °ÍÀÌ´Ù.
ÀÌ ¹æ¹ýÀº °¡Àå ÈçÇÏ°Ô »ç¿ëµÇ´Â °Å¸® ÃøÁ¤ ¹æ¹ýÀÌÁö¸¸ Ç×»ó ÃÖ°íÀÇ ¹æ¹ýÀº ¾Æ´Ï´Ù. °¢ Â÷¿ø¿¡¼ ÇÕ°èµÇ±â Àü¿¡ Á¦°öÀ» Çϱ⠶§¹®¿¡ °è»êÀÌ º¹ÀâÇϰí dissimilarity °¡ Å« °æ¿ì°¡ °Á¶µÉ ¼ö ÀÖ´Ù.
(2) absolute difference
µÎ Æ÷ÀÎÆ®ÀÇ Â÷À̸¦ ±×´ë·Î Ç¥ÇöÇÏ¿© °è»êÇϱⰡ ½±´Ù. city block distance, Manhattan metric, taxi-cab distance ¶ó°íµµ Ç¥ÇöµÈ´Ù.
(3) maximum distance
feature µé Áß¿¡¼ °¡Àå À¯»çÇÏÁö ¾ÊÀº (°Å¸®°¡ ¸¹ÀÌ ¶³¾îÁø) ºÎºÐÀÌ °Á¶µÇ´Â °ÍÀÌ´Ù.
(4) Minkowski distance
°Å¸®¸¦ ÃøÁ¤ÇÏ´Â (1), (2), (3) ¹æ¹ýÀ» Á¾ÇÕÇÑ
¹æ¹ýÀÌ´Ù. ¿©±â¼ Àº Á¶Á¤°¡´ÉÇÑ parameter ·Î¼ ±× °ªÀÌ 1 À̸é absolute difference ¿Í °°°í,
2 À̸é Euclidean distance ¶û °°´Ù.
.........
´ÙÀ½ ±×¸²Àº class A¿¡¼ 3 °³, class B¿¡¼ 2 °³ÀÇ sampleÀ» °¡Áö´Â °æ¿ìÀÇ feature space ÀÌ´Ù. ¾î¶² class¿¡ ¼ÓÇÏ´ÂÁö ¾Ë·ÁÁöÁö ¾ÊÀº sampleÀÌ (1,1) ÁÂÇ¥¿¡ ÀÖÀ» °æ¿ì Euclidean distance ÀÇ ¹æ¹ýÀ» ½á¼ °¡Àå °¡±îÀÌ ÀÖ´Â class´Â (1,3)¿¡ À§Ä¡ÇÑ class A ÀÌ´Ù. µû¶ó¼ class A ¿¡ ¼ÓÇÏ´Â °ÍÀ¸·Î ÇÑ´Ù
........
nearest neighbor classifier ÀÇ ¼º´ÉÀº Bayesian classifier(density functionÀ» ¾Ë°í ÀÖ´Â °æ¿ì) ¿¡ ºñÇØ Ç×»ó ÁÁÁö ¾Ê´Ù. ±× ÀÌÀ¯´Â Bayesian Àº Ç×»ó °¡Àå È®½ÇÇÑ class¸¦ ¼±ÅÃÇϱ⠶§¹®ÀÌ´Ù. ±× ÀÌÀ¯´Â Å©°Ô 2 °¡Áö°¡ ÀÖ´Ù.
1. sample ÀÇ ÁÂÇ¥¿¡¼ °¡Àå °¡±îÀÌ ÀÖ´Â class·Î ºÐ·ùÇϱ⠶§¹®¿¡ ½ÇÁ¦·Î ºÐ·ùµÇ¾î ÀÖ´Â class ¿Í´Â Â÷À̰¡ ÀÖÀ» ¼ö ÀÖ´Ù.
2. ±âÁ¸ ºÐ·ùµÇ¾î ÀÖ´Â class ÀÇ ÁÂÇ¥´Â ´ë°³ Àü¹®°¡¿¡ ÀÇÇØ ±× ÁÂÇ¥°¡ °áÁ¤µÇ´Âµ¥ ±× °Í Á¶Â÷µµ Á¤È®ÇÏÁö ¾ÊÀ» ¼ö ÀÖ´Ù.
class , feature vector
, ¾î¶²
¿¡¼µµ Á¤È®ÇÏ°Ô ºÐ·ùµÉ È®·ü
, È®·ü ¹Ðµµ
ÀÇ °æ¿ì¿¡ Á¤È®ÇÏ°Ô ºÐ·ùµÉ È®·üÀÇ ±â´ë°ªÀº ´ÙÀ½°ú °°´Ù.
-Â÷¿ø °ø°£
¿¡¼
¸¦ Bayes' Á¤¸®·Î ¹Ù²Ù¸é
´ÙÀ½½ÄÀº ¸ðµç class¿¡ ´ëÇÑ mixture density ÀÌ´Ù. Áï Àüü ¸ðÁý´Ü¿¡ ´ëÇÑ density ÀÌ´Ù.
µû¶ó¼
ÀÇ member °¡ ºÎÁ¤È®ÇÏ°Ô ºÐ·ùµÉ È®·ü Áï °¢ classÀÇ error È®·üÀº ´ÙÀ½°ú °°´Ù.
´ÙÀ½ ½Ä¿¡ ÀÇÇØ ¶Ç´Ù¸¥ À¯¿ëÇÑ ½ÄÀÌ ±¸ÇØÁø´Ù.
µû¶ó¼
.
Bayes' Á¤¸®¸¦ ´ëÀÔÇϸé
.
Àüü error È®·üÀ» ¾ò±âÀ§ÇØ À§¿¡¼ ±¸ÇÑ °¢ classÀÇ error È®·üÀÌ °¢ classÀÇ »çÀüÈ®·ü°ú °¡Áߵǰí ÇÕÇÏ¸é ´ÙÀ½½ÄÀÌ ¾ò¾îÁø´Ù.
.
À§¿Í °°ÀÌ °¢ classÀÇ density function°ú »çÀü È®·üÀ» ¾Ë¸é nearest neighbor ¹æ¹ýÀÇ ¿¹»óµÇ´Â error À²À» ±¸ÇÒ ¼ö ÀÖ´Ù. ½ÇÁ¦·Î density¸¦ ¾Ë °æ¿ì¿¡´Â ºÐ·ù¹ýÀ¸·Î Bayes' Á¤¸®¸¦ »ç¿ëÇÒ °ÍÀÌ´Ù. ±×·¯³ª Bayes' Á¤¸®¸¦ »ç¿ëÇÑ ¹æ¹ý°ú density¸¦ ¸ð¸£´Â °æ¿ìÀÇ nearest neighbor ¹æ¹ýÀ» »ç¿ëÇÑ °æ¿ìÀÇ error À²À» ºñ±³ÇÏ´Â °ÍÀº Èï¹ÌÀÖ´Â ÀÏÀÌ´Ù.
Example 5 Estimation of error rates for nearest neighbor and Bayesian classification for two classes with equal prior probabilities.
Figure 9 : Two uniform density functions and the mixture density. The mixture density is dashed.
´ÙÀ½ ±×¸²°ú °°ÀÌ classes
¿Í
ÀÇ »çÀü È®·üÀÌ °¢°¢ 0.5 À̰í class
´Â ¹üÀ§
¿¡¼ ,
´Â ¹üÀ§
¿¡¼ uniformly distributed µÇ¾ú´Ù°í °¡Á¤ÇÏÀÚ. nearest neighbor¸¦ »ç¿ëÇÑ
°æ¿ìÀÇ °¢ class ÀÇ error À²Àº ¾î¶»°Ô µÇ¸ç Bayesian error À²°ú´Â ¾î¶»°Ô ºñ±³µÇ´Â°¡?
density ´Â ±×¸²°ú °°À¸¸ç
´Â 0¿¡¼ 1, 1 ¿¡¼2, 2 ¿¡¼ 5, 0 À¸·Î ±¸ºÐµÈ´Ù. À§¿¡¼ ±¸ÇÑ °¢ class¸¦
À§ÇÑ error È®·ü¿¡ density¸¦ ´ëÀÔÇϸé class
,
ÀÇ error È®·üÀº ´ÙÀ½°ú °°´Ù.
ÀüüÀÇ error À²Àº ´ÙÀ½°ú °°´Ù.
.
Bayesian classification
Àº Ç×»ó °¡Àå È®½ÇÇÑ class¸¦ ¼±ÅÃÇÑ´Ù. µû¶ó¼ ¶ó¸é class A ,
¶ó¸é class B ºÐ·ùµÈ´Ù. µû¶ó¼
,
. Àüü error À²Àº ´ÙÀ½°ú °°´Ù.
À§ÀÇ ¿¹Á¦¿¡¼ nearest neighbor error À²Àº Bayesian error À²¿¡ ºñÇØ 4/3 ¹èÀÌ´Ù.
Example
5 could also have been solved by inspection of Figure 9 : The half of the lying between
and
will be classified correctly, and the half of the
that overlap the density of
have a 1/3 probability of being called
(because
), so
. Also, the left 1/4 of the
have a 2/3 probability of being called
so
. Thus
.
s
s
Figure 4.10 : (a) Density functions for Example 4.6. (b) The density functions multiplied by the prior probabilities of their classes.
and
and
non-
non-
s
s
Example 4.6
and
and
and
if
, and
if
1/3 2/3
, (10)
,
(11)
Substituting from (10),
(12)
in (12). Since
,
. (13)
Substituting (13) for in (12) produces the inequality
. (14)
The term equals 2 when there are two classes and it approaches 1 as the number of
classes becomes large.
Since (14) is true at all values of , it can be used to compare the overall error rates
(15)
and
(16)
Multiplying (14) by and integrating gives
or
. (17)
the integral of a nonnegative quantity must be greater than or equal to 0, so
thus
. (18)
Substituting the integral from (18) into (17) gives
. (19)
or
. (20)
When , this becomes
,
so when ,
.
Nearest neighbor ¹æ¹ýÁß¿¡¼ ÀϹÝÀûÀ¸·Î
»ç¿ëµÇ´Â °ÍÀº ´ÜÇϳªÀÇ °¡Àå °¡±î¿î ÀÌ¿ô ¸¸À¸·Î ±¸ÇÏ´Â °ÍÀÌ ¾Æ´Ï¶ó °³ÀÇ °¡±î¿î ÀÌ¿ôÁß¿¡¼ "¼±ÃâÇÏ¿© (votes)" ¹ÌÁöÀÇ »ùÇõéÀ» ºÐ·ùÇÏ´Â
°ÍÀÌ´Ù. ÀÌ·¯ÇÑ k-nearest neighbor ºÐ·ù°úÁ¤Àº ÈçÈ÷
À̶ó°í Ç¥ÇöµÈ´Ù. ¸¸ÀÏ °¢ Ŭ·¡½º¿¡ ´ëÇØ ¿¡·¯ºñ¿ë (costs of error) °¡ °°´Ù¸é,
¹ÌÁöÀÇ »ùÇÃÀÌ ¼ÓÇÏ´Â °ÍÀ¸·Î ÃßÁ¤µÇ´Â Ŭ·¡½º´Â
°³ÀÇ °¡Àå°¡±î¿î ÀÌ¿ô Áý´ÜÁß¿¡¼ °¡Àå ÈçÇÏ°Ô Ç¥ÇöµÇ´Â (most
commonly represented) Ŭ·¡½º¸¦ ¼±ÅÃÇÏ´Â °ÍÀÌ´Ù. ¿¹¸¦µé¸é ±×¸² 4.8
¿¡¼Ã³·³ 3°³ÀÇ ÀÌ¿ôÀÌ ÀÖ´Ù¸é, ¹ÌÁöÀÇ »ùÇà (1, 1) Àº B Ŭ·¡½º¿¡ ¼ÓÇÏ´Â °ÍÀ¸·Î
ºÐ·ùµÈ´Ù. ¿Ö³ÄÇϸé 3°³ÀÇ °¡Àå°¡±î¿î ÀÌ¿ôÀº Ŭ·¡½º A ÀÇ (1, 3) °ú Ŭ·¡½º B ÀÇ
µÎ°³ÀÇ »ùÇ÷Π±¸¼ºµÇ±â ¶§¹®ÀÌ´Ù.
Figure 11 : Upper bounds
on the k-NN error rate as a function
of the Bayesian error rate
for two classes.
The k-nearest Neighbor Technique
NN
(1,1)
(1, 3)
-nearest
NN
-nearest
NN
s
and
,000 1,000,000
by
and
and
and
and
Example 4.7
242 1570.20
to 0.35
Figure 4.12 : Performance of variants
of the nearest neighbor decisin rule [Wu].
of
5.5 0.35/0.20 = 1.75
ºÐ·ùÇϰíÀÚ ÇÏ´Â classµé »çÀÌÀÇ
decision boundary ÀÇ ±â´ÉÀû ÇüÅÂ(Áï ±×°ÍÀÌ ¸îÂ÷¿øÀÌÁö)¸¦ ¾Ë°í ÀÖ´Ù°í °¡Á¤Çϰí
classµéÀ» °¡Àå Àß ºÐ·ùÇÏ´Â decision boundary¸¦ ã´Â ¹æ¹ýÀÌ´Ù. ¿¹¸¦µé¸é 2 °³ÀÇ
class¸¦ ºÐ·ùÇϱâ À§ÇØ ¼±Çü decision boundary °¡ »ç¿ëµÇ¸ç °¢ sample ÀÌ °³ÀÇ feature¸¦ °¡Áø´Ù¸é discriminant function(½Äº°ÇÔ¼ö) ´Â ´ÙÀ½ÀÇ ÇüŸ¦
°¡Áø´Ù.
ÀÌ ½Ä¿¡¼ Àº µÎ class¸¦ ºÐ·ùÇÏ´Â decision boundary ½ÄÀÌ´Ù. °¡ÁßÄ¡
Àº training set¿¡¼ ÁÁÀº ¼º´ÉÀ» ³»±âÀ§ÇØ ¼±ÅõȴÙ. feature vector
¸¦ °¡Áö´Â sample ÀÌ ÇϳªÀÇ class·Î ºÐ·ùµÇ°í
À̶ó¸é class 1,
À̶ó¸é class -1 À̶ó°í ºÎ¸¥´Ù°í ÇÏÀÚ. ¸¸ÀÏ
À̶ó¸é sample x ´Â ºÐ·ùµÉ ¼ö ÀÖ´Ù.
±âÇÏÇÐÀûÀ¸·Î Àº
-Â÷¿ø feature space¸¦ 2 ¿µ¿ªÀ¸·Î ³ª´©´Â decision boundary ¹æÁ¤½ÄÀÌ´Ù.
¿¡¼ class1,
¿¡¼ class -1À» ±¸ºÐÇÏ´Â decision boundary °¡ Á¸ÀçÇÑ´Ù¸é µÎ class ´Â linearly
separable ÇÏ´Ù°í ÇÑ´Ù.
±×¸² ......
and
adaptive decision boundary algorithm Àº ´ÙÀ½ÀÇ ´Ü°è·Î ±¸¼ºµÈ´Ù.
ÀûÀýÇÑ ¹æÇâÀ¸·Î °¡ÁßÄ¡¸¦ Á¶Á¤Çذ¡´Â
°úÁ¤À» º¸ÀÚ. °¡ À߸ø ºÐ·ùµÇ¾ú´Ù¸é
ÀÇ »õ·Î¿î °ªÀº ´ÙÀ½°ú °°À» °ÍÀÌ´Ù.
À̸ç
,
À̸é
À̰í
À̸é
ÀÌ´Ù.
ÀÌ ¾Ë°í¸®Áò¿¡¼ °¡ÁßÄ¡¸¦
Á¶Á¤ÇÏ´Â °ÍÀº ºÎÁ¤È®ÇÏ°Ô ºÐ·ùµÈ sampleÀ» Á¤È®ÇÏ°Ô µÇµµ·Ï decision boundary¸¦
À̵¿ÇÏ´Â °ÍÀ» ÀǹÌÇÑ´Ù. ÇϳªÀÇ sample¿¡ ´ëÇÑ adaptation ÈÄ¿¡´Â »õ·Î¿î °¡ °è»êµÉ ÇÊ¿ä´Â ¾øÀ¸¸ç ´ÙÀ½ sample·Î À̵¿ÇÑ´Ù.
Example 4.8 ÇϳªÀÇ ¼öÄ¡ feature
() ¸¦ »ç¿ëÇÏ¿© sampleµéÀ» ºÐ·ùÇÏ´Â decision boundary¸¦ ã´Â °ÍÀ» º¸¿©ÁØ´Ù.
training sampleÀº ´ÙÀ½°ú °°´Ù.
|
|
|
1 2 |
-4 -1 |
-1 1 |
¿©±â¼ ´Â sampleÀÇ ¼öÀ̰í
´Â 1Â÷¿ø feature À̸ç
´Â ºÐ·ùÇÒ classÀÌ´Ù. »ó¼ö
¿Í
´Â 1·Î ÁÖ¾îÁø´Ù(Àû´çÈ÷ ÁÖ¾îÁø´Ù?). °¡ÁßÄ¡
¿¡¼ ½ÃÀÛÇÑ´Ù. ù¹øÂ° sample
À» »ç¿ëÇÏ¿©
¸¦ ¾ò¾ú´Ù.
À̶ó¸é class 1,
À̶ó¸é class -1 À̶ó°í ºÎ¸¥´Ù°í °¡Á¤ÇÏ¿´´Ù. ±×·¯³ª
ÀÌ°í µû¶ó¼ sample Àº À߸ø ºÐ·ùµÇ¾î(error) »õ·Î¿î °¡ÁßÄ¡¸¦ Àû¿ëÇØ¾ß ÇÑ´Ù.
.
ÀÌ »õ·Î¿î °¡ÁßÄ¡°¡ ´ÙÀ½ tableÀÇ
ÀÏ ¶§ º¼ ¼ö ÀÖ´Ù. ¿©±â¼
´Â ¹Ýº¹ Ƚ¼öÀÌ´Ù. ¾Ë°í¸®ÁòÀº °è¼Ó ¹Ýº¹µÇ¾î
°ú
¿¡¼¾ß Á¤È®ÇÏ°Ô ºÐ·ùµÈ´Ù. ¿Ö³ÄÇϸé decision boundary´Â
Áï
¿¡¼ ¾ò¾îÁö±â ¶§¹®ÀÌ´Ù. ¸¶Áö¸· decision boundary´Â
ÀÌ´Ù. ÀÌ data·ÎºÎÅÍ ¸¸Á·½º·± decision boundary´Â -4 ¿Í -1 »çÀÌ¿¡
ÀÖ´Ù´Â °ÍÀ» ¾Ë ¼ö ÀÖ´Ù.
|
|
|
|
Old |
Old |
|
Error? |
New |
New |
1 |
1 |
-4 |
-1 |
0 |
0 |
0 |
Yes |
-1 |
4 |
Example 4.9 2 °³ÀÇ ¼öÄ¡ feature¸¦ °¡Áø sampleÀ» ºÐ·ùÇÏ´Â decision boundary¸¦ ã´Â °ÍÀÌ´Ù. training sampleÀº ´ÙÀ½°ú °°´Ù.
|
|
|
2 |
10 |
1 |
À§ÀÇ ¿¹Á¦¿¡ ´ëÇÏ¿© ¾ò¾îÁø decision boundary ÀÇ À§Ä¡ÀÌ´Ù.
(a) ÀÇ °æ¿ì¿¡ training set ¿¡ ´ëÇÏ¿© 20, 40, 60, 63 ȸÀÇ ¾Ë°í¸®ÁòÀÇ ¹Ýº¹À¸·Î
¾ò¾îÁö´Â decision boundary ÀÇ º¯ÈµÇ´Â ±×¸²ÀÌ´Ù. ÁÂÇ¥(5,2) °¡ Á¤È®È÷ ºÐ·ùµÉ
¶§ ±îÁö °è¼ÓµÈ °ÍÀÌ´Ù. feature¸¦ Á¤±ÔÈ ½ÃŰÁö ¾Ê°í Á¤È®ÇÑ °¡ÁßÄ¡¸¦ ¾ò´Âµ¥ 381
step ¶Ç´Â ¾à 63 ȸÀÇ ¹Ýº¹ÀÌ ÇÊ¿äÇÏ¿´´Ù.
(b)
,
(featureÀÇ Æò±ÕÀý´ë°ª) ÀÇ °æ¿ì¿¡ 1, 2, 3, 4 ȸÀÇ ¾Ë°í¸®ÁòÀÇ ¹Ýº¹À¸·Î ¾ò¾îÁø
±×¸²ÀÌ´Ù. ´ÜÁö 4 ¹ø¸¸¿¡ ¾Ë°í¸®ÁòÀÌ ¼ö·ÅÇÑ °ÍÀ» º¼¼öÀÖ´Ù.
.......
2 °³ ÀÌ»óÀÇ class°¡ ÀÖÀ» ¶§ °¢ class ¿¡ ´ëÇÑ ºÐ¸®µÈ ¼±Çü discriminant functionÀ» À¯µµÇÏ¿©
±× Áß °¡Àå Å« discriminant functionÀ» °¡Áö´Â class¸¦ ¼±ÅÃÇÏ´Â °ÍÀÌ´Ù. ¸¸ÀÏ °³ÀÇ class ¿Í
°³ÀÇ º¯¼ö¸¦ °¡Áø´Ù¸é ¼±Çü discriminant function ÁýÇÕÀº ´ÙÀ½°ú °°´Ù.
.
º¯¼ö ´Â ºñ¼±Çü ÇÔ¼ö ÀÏ ¼ö ÀÖÀ¸¸ç
°¡ °¡ÁßÄ¡
ÁýÇÕÀÇ ¼±Çü ÇÔ¼öÀÏ ¶§ ºñ¼±Çü discriminant function °¡ »ç¿ëµÉ ¼öµµ ÀÖ´Ù.
sample ¸¦ ºÐ·ùÇϱâ À§ÇØ À§ÀÇ ½Ä°ú °°Àº discriminant functionÀ» °è»êÇÏ¿© sampleÀ»
°¡Àå Å«
¸¦ °¡Áö´Â class
·Î ºÐ·ùÇÑ´Ù. ÀÌ·¯ÇÑ discriminant function ¿¡ °¡ÁßÄ¡¸¦ Àû¿ëÇÏ´Â ¹æ¹ýÀº ¼±Çü
decision boundary¿¡¼ °¡ÁßÄ¡¸¦ ã´Â ¹æ¹ý°ú °°´Ù. ÀÌ·¯ÇÑ ±â¼úÀº ÇØ°¡ Á¸ÀçÇÑ´Ù¸é
¸ðµç data¸¦ ¿Ïº®ÇÏ°Ô ºÐ·ùÇÏ´Â °¡ÁßÄ¡ ÁýÇÕÀ¸·Î ¼ö·ÅÇÏ´Â °ÍÀ» º¸ÀåÇÑ´Ù.
sample
x °¡ class ·Î ºÐ·ùµÇ¾î¾ß Çϴµ¥
·Î À߸ø ºÐ·ùµÇ¾úÀ» °æ¿ì 2 °³ÀÇ °°Àº discriminant function (
¿Í
)¸¦ À§ÇÑ »õ·Î¿î °¡ÁßÄ¡°¡ ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.
ÀÏ ¶§
Áï ÀÇ °¡ÁßÄ¡¸¦ Áõ°¡½ÃÄѼ
ÀÇ °ªÀÌ ÃÖ´ë°ª¿¡ °¡±î¿ö Áöµµ·Ï Áõ°¡½ÃÄÑ¾ß ÇÑ´Ù. ¹Ý´ë·Î
ÀÇ °¡ÁßÄ¡´Â °¨¼Ò½ÃÄѼ
ÀÇ °ªÀ» ÁÙ¿©ÁØ´Ù. ´Ù¸¥ discriminant function ÀÇ °ªÀ» º¯È½Ãų ÇÊ¿ä´Â ¾ø´Ù.
¿Ö³ÄÇϸé Àß ¸ø ºÐ·ùµÇÁö ¾Ê¾Ò±â ¶§¹®ÀÌ´Ù.
°¡ÁßÄ¡¸¦ º¯È½ÃŰ´Â °£´ÜÇÑ ¹æ¹ýÀ¸·Î ¿Ïº®ÇÑ ºÐ·ù°¡ °¡´ÉÇÑ ±â¼úÀÌ´Ù. ±×·¯³ª error ¾øÀÌ data¸¦ ºÐ·ùÇÒ ¸¸ÇÑ °¡ÁßÄ¡ Á¶ÇÕÀÌ ¾øÀ» °æ¿ì¿¡´Â ÃÖ¼±ÀÇ °¡ÁßÄ¡¸¦ ã´Â °ÍÀ» º¸ÀåÇÏÁö´Â ¾Ê´Â´Ù. ±×·¯³ª °æÇèÀûÀ¸·Î º¸¸é ±×°ÍÀº ´ë°³ ÁÁÀº ÀýÃæ¾È¿¡ µµ´ÞÇÑ´Ù.
ÇϳªÀÇ ¼±Çü discriminant function Àº feature space¸¦ discriminant functionÀÌ °¡Àå Å« °ªÀ» °¡Áö´Â region À¸·Î ºÐÇÒÇÑ´Ù. data¸¦ ºÐ·ùÇϱâ À§ÇØ regionµé °£ÀÇ decision boundary ÀÇ À§Ä¡¸¦ °è»êÇÒ ÇÊ¿ä´Â ¾ø´Ù. ±×·¯³ª ÀÌ·¯ÇÑ regionµéÀÌ ¾î¶² ÇüŸ¦ Çϰí ÀÖ´ÂÁö¸¦ º¸´Â °ÍÀº Èï¹Ì·Ó´Ù. ´ÙÀ½ ¿¹Á¦´Â 3 °³ÀÇ class¸¦ À§ÇÑ decision region µéÀ» º¸¿©ÁØ´Ù.
Example 4.10 Finding the decision regions resulting from three discriminant functions.
and
and
, and
and
+1
+2 and
-2
9¤ý8/2 = 36
Figure 4.15
and
and
nd
and
(where
)
and
in
and
and
and
and
thus
Figure 4.16
ºñ·Ï adaptive decision boundary ¿Í adaptive discriminant function ÀÌ »ó´çÈ÷ ¸Å·ÂÀûÀÎ ¹æ¹ýÀÌÁö¸¸ ¿Ïº®ÇÏ°Ô ºÐ·ùÇϱâ À§Çؼ´Â ¾Ë°í¸®ÁòÀ» À§ÇÑ feature space¿¡¼ class µéÀÌ linearly separable ÇØ¾ß¸¸ ÇÑ´Ù. ¶ÇÇÑ class µéÀÌ hyperplane À¸·Î ºÐ¸® µÇ´õ¶óµµ decision boundary¸¦ ã±âÀ§Çؼ´Â ¸¹Àº ¹Ýº¹ (many iteration)ÀÌ ¿ä±¸µÈ´Ù. ¶ÇÇÑ adaptive ¾Ë°í¸®ÁòÀÌ training data¸¦ Á¤È®ÇÏ°Ô ºÐ·ùÇÏ´Â ÃÖÃÊÀÇ °¡ÁßÄ¡ ÁýÇÕÀ» ¹ß°ßÇÒ ¶§ Á¾·áµÇÁö¸¸, ¹«¾ùÀÌ ÁÁÀº decision boundary ÀÎÁö¿¡ ´ëÇÑ Á÷°üÀûÀÎ (intuitive notion) °Í°ú °°Áö ¾ÊÀ» ¼ö ÀÖ´Ù. Áï ÁÁÀº decision boundary °¡ ¾Æ´Ñµ¥ Á¾·áµÉ ¼ö ÀÖ´Ù. ¿¹µéµé¸é ´ÙÀ½ ±×¸²ÀÇ (b)¿¡¼ 4 ¹ø¸¸¿¡ ¾ò¾îÁø adaptive decision boundary º¸´Ù´Â Á÷°üÀûÀ¸·Î + ¿Í - ¸¦ ±¸ºÐÇÏ´Â ¼öÁ÷ decision boundary °¡ ´õ ÁÁ´Ù´Â °ÍÀ» Á÷°üÀûÀ¸·Î ¾Ë ¼ö ÀÖ´Ù.
minimum squared error ºÐ·ù ¹æ¹ýÀº iterationÀ» ¿ä±¸ÇÏÁöµµ ¾Ê°í, linearly separableÀ» ¿ä±¸ÇÏÁöµµ ¾Ê´Â´Ù. adaptive decision boundary ¹æ¹ýÀ» »ç¿ëÇÑ °Íº¸´Ù ´õ Á÷°üÀûÀ¸·Î ±×·² µíÇÑ decision boundary¸¦ ã¾Æ³»Áö¸¸ ¿ÏÀüÇÑ ÇϳªÀÇ ÇØ¸¦ ã¾Æ³»´Â °ÍÀ» º¸ÀåÇÏÁö´Â ¾Ê´Â´Ù. minimum squared error ¿¡¼´Â classÀÇ ¼ö¿¡ »ó°ü¾øÀÌ ´Ü ÇϳªÀÇ discriminant functionÀ» »ç¿ëÇÑ´Ù.
¸¸ÀÏ °³ÀÇ sample °ú
°³ÀÇ feature °¡ ÀÖ´Ù¸é,
°³ÀÇ feature vector °¡ ÀÖÀ» °ÍÀÌ´Ù.
,
.
ÀÇ true class¸¦
¶ó°í ÇÒ ¶§ ±×°ÍÀº ¾î¶² ¼öÄ¡°ªµµ °¡Áú ¼ö ÀÖ´Ù. ¿ì¸®´Â ´Ü ÇϳªÀÇ linear discriminant function
(
) À» À§ÇÑ °¡ÁßÄ¡ ÁýÇÕ
,
À» ã°íÀÚ ÇÑ´Ù.
¿©±â¼ ¸ðµç ¿¡ ´ëÇØ
ÀÌ´Ù. ½ÇÁ¦·Î ±×·¯ÇÑ °¡ÁßÄ¡ °ªÀº Á¸ÀçÇÏÁö ¾ÊÁö¸¸ ´ÜÁö ÀûÀýÇÏ°Ô ÀûÀýÇÏ°Ô °¡ÁßÄ¡
¸¦ ¼±ÅÃÇÏ´Â °ÍÀÌ´Ù. Áï ¸ñǥġ (desired values
) ¿Í Ãâ·ÂÄ¡ (actual values
) °£ÀÇ Â÷ÀÌÀÇ Á¦°öÀÇ ÇÕÀÌ ÃÖ¼Ò°¡ µÇµµ·Ï ÇÏ´Â °ÍÀÌ´Ù. ±× °ª
´Â ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.
or
Example 11 The minimum squared error procedure.
|
|
|
0 |
0 |
-1 |
Inserting this data and (24) into (25) produces
.
Computing the partial derivatives
of with respect to
,
, and
and setting each one equal to zero produces
,
,
and
.
Solving for ,
, and
results in
and
. Substituting these weights into equation (24) gives the discriminant function
.
or
Example 12 Comparison of the minimum squared error decision boundary with the adaptive decision boundary.
,
, and
Figure 17 : The minimum squared error decision boundary for Example 11.
..........
À§ ±×¸²¿¡¼ (a) ¿¡¼´Â minimum
squared error (MSE) ¹æ¹ýÀÌ adaptive decision boundary
(ADB) ¹æ¹ýº¸´Ù ´õ ÁÁÀº °á°ú¸¦ º¸ÀδÙ. (b) ¿¡¼´Â boundary ·Î¼ ¸¦ »ç¿ëÇÏ¿© ADB °¡ MSE º¸´Ù ´õ ÁÁÀº °á°ú¸¦ º¸¿©ÁØ´Ù.
À§ ±×¸² (a) ¿¡¼´Â °¢ class¿¡ ´ëÇØ
ÇϳªÀÇ adaptive linear decision boundary ¸¦ »ç¿ëÇÏ¿© ¿Ïº®ÇÏ°Ô ±¸ºÐµÈ classµéÀÌ´Ù.
ÇÊ¿äÇÑ boundary µé¸¸À» º¸¿©ÁØ´Ù. (b)¿¡¼´Â °¢ class ¿¡ ´ëÇØ ÇϳªÀÇ adaptive discriminant function À»
»ç¿ëÇÏ¿© ¿Ïº®ÇÏ°Ô ±¸ºÐµÈ´Ù.
(c)¿¡¼´Â 1, 3, 5, 9 ÀÇ °ªÀ» °¡Áö´Â minimum squared error À» »ç¿ëÇÏ¿© ¿Ïº®ÇÏ°Ô ±¸ºÐµÈ´Ù.
...............