K-means Clustering Algorithm

 

K-means (MacQueen, 1967) Àº À¯¸íÇÑ ±ºÁýÈ­ (Clustering) ¹®Á¦¸¦ ÇØ°áÇÏ´Â °¡Àå °£´ÜÇÑ ÀÚÀ²ÇнÀ (Unsupervised Learning) ¾Ë°í¸®ÁòÁß ÇϳªÀÌ´Ù. »çÀü¿¡ Á¤ÇØÁø ¾î¶²¼öÀÇ Å¬·¯½ºÅ͸¦ ÅëÇؼ­ ÁÖ¾îÁø µ¥ÀÌÅÍ ÁýÇÕÀ» ºÐ·ùÇÏ´Â °£´ÜÇÏ°í ½¬¿î ¹æ¹ýÀÌ´Ù. k-means ´Â partitional clustering ¿¡ ¼ÓÇÑ´Ù.

data ÀÌ¿Ü¿¡ cluster ÀÇ ¼ö ¸¦ input À¸·Î Çϸç À̶§ ¸¦ seed point ¶ó°í ÇÑ´Ù. seed point ´Â ÀÓÀÇ·Î ¼±ÅÃµÇ¸ç ¹Ù¶÷Á÷ÇÑ cluster ±¸Á¶¿¡ °üÇÑ ¾î¶² Áö½ÄµéÀÌ seed point¸¦ ¼±ÅÃÇϴµ¥¿¡ »ç¿ëµÉ ¼ö ÀÖ´Ù. Forgy' algorithm °ú ´Ù¸¥Á¡Àº ÇϳªÀÇ sample ÀÌ ÇϳªÀÇ cluster ¿¡ ÇÕ·ùÇÏÀÚ¸¶ÀÚ °ð cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù´Â °ÍÀÌ´Ù. ¶ÇÇÑ Forgy' algorithm ÀÌ ¹Ýº¹Àû(iterative) ÇÑ ¹Ý¸é¿¡ -means algorithm Àº data set¿¡¼­ ´ÜÁö µÎ ¹ø¸¸ÀÇ pass °¡ ÀÌ·ç¾îÁø´Ù. ±× °úÁ¤Àº ´ÙÀ½°ú °°´Ù.

1. óÀ½¿¡ cluster ·Î¼­ ½ÃÀÛÇÑ´Ù. ³²¾ÆÀÖ´Â  sampleµé¿¡ ´ëÇؼ­´Â °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. ÀÌ°Í¿¡ °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áö´Â °ÍÀÌ È®ÀÎµÈ cluster ¿¡ sampleÀ» Æ÷ÇÔ½ÃŲ´Ù. °¢°¢ÀÇ sample µéÀÌ ÇÒ´çµÈ ÈÄ¿¡ ÇÒ´çµÈ cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù.

2. ±× data¸¦ µÎ ¹ø ó¸®ÇÑ´Ù. °¢ sample¿¡ ´ëÇÏ¿© °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áø °ÍÀ¸·Î È®ÀÎµÈ cluster ¿¡ sampleÀ» À§Ä¡½ÃŲ´Ù. (ÀÌ step ¿¡¼­´Â ¾î¶² centroid µµ ´Ù½Ã °è»êÇÏÁö ¾Ê´Â´Ù.)

site :

A Tutorial on Clustering Algorithms : K-means : Applet

paper :

The k-means Algorithm : Earl Gose ¿Ü

K-Æò±Õ±ºÁýÈ­(K-Means Clustering) : Àå³²½Ä

K-Æò±Õ ±ºÁý ¹æ¹ýÀ» ÀÌ¿ëÇÑ °¡Áß Ä¿³ÎºÐ·ù±â (Kernel Pattern Recognition using K-means Clustering Method) : ½ÉÁ¤¿í, ¹éÀå¼±, Çѱ¹Åë°èÇÐȸ ÀÀ¿ëÅë°è¿¬±¸ 13±Ç 2È£, 2000

K-Æò±Õ ±ºÁýÈ­ÀÇ ÀçÇö¼º Æò°¡ ¹× ÀÀ¿ë (Reproducibility Assessment of K-Means Clustering and Applications) : Çã¸íÈñ, À̿뱸, Çѱ¹Åë°èÇÐȸ ÀÀ¿ëÅë°è¿¬±¸ 17±Ç 1È£, 2004