Clustering

 

Pattern Recognition and Image Analysis : Earl Gose. Richard Johnsonbaugh. Steve Jost Àú¼­, Prentice Hall, 1996, Page 199~219

 

5.1  Introduction

±¸ºÐÇÏ·Á°í ÇÏ´Â °¢ class¿¡ ´ëÇÑ ¾Æ¹«·± Áö½ÄÀÌ ¾ø´Â »óÅ¿¡¼­ classifyÇÏ´Â °ÍÀ̹ǷΠunsupervised learning¿¡ ÇØ´çÇÑ´Ù. Áï sampleµé¿¡ ´ëÇÑ Áö½Ä¾øÀÌ similarity (À¯»çµµ) ¿¡ ±Ù°ÅÇÏ¿© cluster µéÀ» ±¸ºÐÇÑ´Ù. ÆÐÅÏ °ø°£¿¡ ÁÖ¾îÁø À¯ÇÑ °³ÀÇ ÆÐÅϵéÀÌ ¼­·Î °¡±õ°Ô ¸ð¿©¼­ ¹«¸®¸¦ ÀÌ·ç°í ÀÖ´Â ÆÐÅÏ ÁýÇÕÀ» cluster (±ºÁý) À̶óÇÏ°í ¹«¸®Áö¿ö ³ª°¡´Â ó¸® °úÁ¤À» clustering À̶ó ÇÑ´Ù.

cluster °£ÀÇ À¯»çµµ¸¦ Æò°¡Çϱâ À§ÇØ ¿©·¯ °¡ÁöÀÇ °Å¸® ÃøÁ¤ ÇÔ¼ö¸¦ »ç¿ëÇϴµ¥ ¿¹¸¦µé¸é Euclidean distance, Mahalanobis distance, Lance-Williams distance, Hamming distance µîÀÌ »ç¿ëµÈ´Ù.

5.2  Hierarchical Clustering

hierarchy ´Â ´ÙÀ½ ±×¸²°ú °°ÀÌ tree ±¸Á¶·Î Ç¥ÇöµÉ ¼ö ÀÖ´Ù. µ¿¹° º´¿ø¿¡¼­ ȯÀÚµéÀº Å©°Ô °³¿Í °í¾çÀÌ·Î ±¸¼ºµÇ¸ç °¢ÀÚ subgroup À¸·Î ³ª´²Áö¸ç ±×°ÍÀº ´Ù½Ã subgroup À¸·Î ³ª´²Áø´Ù. °¢°³ÀÇ µ¿¹°Àº 1 ~ 5 ·Î ±¸ºÐµÇ¸ç °¡Àå ¾Æ·¡¿¡ À§Ä¡ÇÑ´Ù. hierarchical clustering Àº data¸¦ ¸¹Àº ÀÛÀº ±×·ìÀ¸·Î ±¸¼ºµÇ´Â Å« ±×·ìÀ» ±¸¼ºÇÏ´Â °úÁ¤À» ÀǹÌÇÑ´Ù. ÈçÈ÷ tree ³ª dendrogram À¸·Î ±×·Á¼­ Ç¥ÇöÇÏ¸ç °¡Àå ¹Ì¼¼ÇÑ ±×·ìÀº °¡Àå ¾Æ·¡¿¡ À§Ä¡ÇÏ¸ç °¢ sample Àº ÇϳªÀÇ cluster¸¦ Çü¼ºÇÑ´Ù. °¡Àå Å« ±×·ìÀÌ Á¦ÀÏ À§¿¡ À§Ä¡ÇÏ¸ç °¢ sample µéÀº ÇϳªÀÇ cluster ±×·ì¿¡ ¼ÓÇÑ´Ù.

À§ÀÇ ±×¸²¿¡¼­ level ¿¡ µû¶ó ´ÙÀ½ÀÇ sample µé·Î ±¸¼ºµÇ´Â cluster ·Î ±¸¼ºµÈ´Ù.

level 0 : {1}, {2}, {3}, {4}, {5},

level 1 : {1, 2}, {3}, {4}, {5}.

level 2 : {1, 2}, {3}, {4, 5}.

level 3 : {1, 2, 3}, {4, 5}.

level 4 : {1, 2, 3, 4, 5}  : ÇϳªÀÇ cluster ·Î ±¸¼ºµÈ´Ù.

hierarchical clustering ¿¡¼­´Â ¾î¶² level ¿¡ ÇϳªÀÇ cluster ¿¡ ¼ÓÇÏ´Â µÎ sample ÀÌ ÀÖ´Ù¸é ±×°ÍÀº »óÀ§ level ¿¡¼­´Â °°Àº cluster ¿¡ ¼ÓÇÑ´Ù. Áï À§ÀÇ ±×¸²¿¡¼­ level 2 ¿¡¼­ÀÇ 4 ¿Í 5 ´Â °°Àº cluster ¿¡ ¼ÓÇÏ¸ç ±×°ÍÀº level 3 °ú 4 ¿¡¼­µµ °°Àº cluster ¿¡ ¼ÓÇÑ´Ù.

hierarchical clustering algorithm Àº °èÃþ±¸Á¶¸¦  bottom up ¹æ½ÄÀ¸·Î ¸¸µé °æ¿ì agglomerative (º´ÇÕ½Ä) À̶ó°í ÇÏ°í, top down ¹æ½ÄÀ̸é divisive (ºÐÇÒ½Ä) À̶ó°í ÇÑ´Ù.

cluster °£ÀÇ À¯»çµµ¸¦ ÃøÁ¤ÇÏ´Â ´Ù¸¥ ¹æ¹ýµéÀ» »ç¿ëÇÏ¿© ´Ù¸¥ hierarchical clustering algorithmÀ» ±¸ÇÒ ¼ö ÀÖ´Ù. ±× À¯»çµµ¸¦ ÃøÁ¤ÇÏ´Â ÇϳªÀÇ ¹æ¹ýÀÌ cluster °£ °Å¸®¸¦ ÃøÁ¤ÇÏ´Â ÇÔ¼ö¸¦ Á¤ÀÇÇÏ´Â °ÍÀÌ´Ù. °Å¸® ÇÔ¼ö´Â sample ÀÇ ½Öµé°£ °Å¸®¸¦ ÃøÁ¤ÇÏ´Â ÀáÀç ÇÔ¼ö¿¡ ÀÇÇØ À¯µµµÈ´Ù. nearest neighbor ÀÇ °æ¿ìó·³ cluster ¹æ¹ý¿¡¼­µµ °¡Àå ÀϹÝÀûÀÎ °Å¸® ÃøÁ¤Àº Eucledian distance ¿Í city block distance ¹æ¹ýÀÌ´Ù. 

Agglomerative Clustering Algorithm

ÀϹÝÀûÀÎ agglomerative clustering algorithm Àº ¹¦»çÇϱⰡ ¼ö¿ùÇÏ´Ù. sample ÀÇ Àüü ¼ö¸¦ n À̶ó°í ÇÒ °æ¿ì ´ÙÀ½°ú °°Àº °úÁ¤À» °ÅÄ£´Ù.

1. n °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. °¢°¢Àº ÇϳªÀÇ sampleÀ» Æ÷ÇÔÇÑ´Ù.

2. step 3À» n - 1 ¹ø ¹Ýº¹ÇÑ´Ù.

3. °¡Àå À¯»çÇÑ cluster ¿Í À» ã¾Æ¼­ ÇϳªÀÇ cluster ·Î º´ÇÕÇÑ´Ù. ¸¸ÀÏ °°Àº °ªÀÌ ÀÖÀ¸¸é ¸ÕÀú ¹ß°ßµÈ ½ÖÀ» º´ÇÕÇÑ´Ù.

The Single-Linkage Algorithm

´Ù¸¥ À̸§À¸·Î´Â minimum method ¿Í nearest neighbor method ·Îµµ ºÒ¸®¿î´Ù. ÈÄÀÚÀÇ À̸§Àº nearest neighbor classification °ú ¹ÐÁ¢ÇÑ °ü°è¸¦ º¸¿©ÁØ´Ù. °¢ cluster¿¡ Çϳª¾¿ÀÇ Á¡ÀÌ ÀÖÀ» ¶§ µÎ Á¡ °£ÀÇ °¡Àå °¡±î¿î °Å¸®¸¦ µÎ cluster °£ÀÇ °Å¸®·Î¼­ Á¤ÀÇÇÏ´Â ¾Ë°í¸®ÁòÀÌ´Ù.

µÎ cluster ¿Í °¡ ÀÖÀ» ¶§ ±×µé »çÀÌÀÇ °Å¸®´Â ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ. ¿©±â¼­ ´Â sample ¿Í »çÀÌÀÇ °Å¸®¸¦ ÀǹÌÇÑ´Ù.


Example 5.1  single-linkage algorithmÀ» »ç¿ëÇÑ Hierarchical clustering

5 °³ÀÇ sample °ú 2 °³ÀÇ feature ¿Í ¸¦ °¡Áö´Â °æ¿ì¸¦ º¸ÀÚ. 5 °³ÀÇ sampleÀº ´ÙÀ½ ±×¸²°ú °°ÀÌ ºÐÆ÷ÇØ ÀÖ´Ù°í ÇÏÀÚ. sample °£ÀÇ °Å¸® ÃøÁ¤À» À§ÇØ Euclidean distance¸¦ »ç¿ëÇ϶ó

´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé °£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.

 

 

 

1

2

3

4

5

1

2

3

4

5

4

8

15

24

24

4

4

8

4

12

 

1

2

3

4

5

-

4.0

11.7

20.0

21.5

4.0

-

8.1

16.0

17.9

11.7

8.1

-

9.8

9.8

20.0

16.0

9.8

-

8.0

21.5

17.9

9.8

8.0

-

ÇϳªÀÇ sampleÀ» °¡Áö´Â cluster ¿Í °¡ ÀÖ°í  ±×µé°£ÀÇ °Å¸®´Â ÀÌ´Ù.

¾Ë°í¸®ÁòÀº °¢ÀÚ ÇϳªÀÇ sampleÀ» °¡Áö´Â 5 °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. ±×¸®°í ³ª¼­ 2 °³ÀÇ °¡Àå °¡±î¿î cluster °¡ º´ÇյȴÙ. À§ÀÇ Ç¥¿¡¼­ °¡Àå ÀÛÀº ¼ö´Â 4 ÀÌ°í sample 1 °ú 2 »çÀÌÀÇ °Å¸®ÀÌ´Ù. µû¶ó¼­ cluster {1} °ú {2} °¡ º´ÇյȴÙ. À̶§´Â ´ÙÀ½ÀÇ 4 °³ÀÇ cluster °¡ µÈ´Ù.

{1, 2}, {3}, {4}, {5}

´ÙÀ½¿¡´Â 4 °³ÀÇ cluster °£ÀÇ °Å¸®°¡ ´ÙÀ½ Çà·Ä°ú °°ÀÌ ÁÖ¾îÁø´Ù.

 

{1, 2}

3

4

5

{1, 2}

3

4

5

-

8.1

16.0

17.9

8.1

-

9.8

9.8

16.0

9.8

-

8.0

17.9

9.8

8.0

-

Çà {1, 2} °ú ¿­ 3 À§Ä¡¿¡ ÀÖ´Â °ª 8.1 Àº cluster {1, 2} °ú {3} »çÀÌÀÇ °Å¸®ÀÌ¸ç ´ÙÀ½°ú °°ÀÌ °è»êµÈ´Ù. À§ÀÇ Ã¹ ¹ø° Ç¥¿¡¼­ À̸ç À» º¸¿©ÁØ´Ù. single cluster algorithm ¿¡¼­´Â µÎ °ªÁß¿¡¼­ ÃÖ¼Ò°ª 8.1À» cluster °£ÀÇ °Å¸®·Î ¼±ÅÃÇÑ´Ù. ù° ÇàÀÇ ´Ù¸¥ °ªµéµµ °°Àº ¹æ¹ýÀ¸·Î °è»êµÈ´Ù. ù° Çà°ú ¿­ ÀÌ¿ÜÀÇ ´Ù¸¥ °ªµéÀº ±× ´ë·Î ½Â°èÇÑ´Ù. ´ÙÀ½À¸·Î Çà·ÄÇ¥¿¡¼­ÀÇ ÃÖ¼Ò°ªÀÌ 8 À̱⠶§¹®¿¡ cluster {4} ¿Í {5} °¡ º´ÇյȴÙ. µû¶ó¼­ 3 °³ÀÇ cluster °¡ µÈ´Ù.

{1, 2}, {3}, {4, 5}

3 °³ÀÇ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½ Çà·ÄÇ¥¿Í °°´Ù.

 

{1, 2}

3

{4, 5}

{1, 2}

3

{4, 5}

-

8.1

16.0

8.1

-

9.8

16.0

9.8

-

À§ Ç¥¿¡¼­ÀÇ ÃÖ¼Ò°ªÀº 8.1 À̱⠶§¹®¿¡ cluster {1, 2} ¿Í {3} ÀÌ º´ÇյǾî 2 °³ÀÇ cluster ·Î µÈ´Ù.

 {1, 2, 3}, {4, 5}

´ÙÀ½À¸·Î °Å¸® 9.8¿¡¼­ ³²¾ÆÀÖ´Â cluster µÎ °³¸¦ º´ÇÕÇÑ´Ù. ÀÌ·¡¼­ hierarchical clustering ÀÌ ¿Ï¼ºµÇ¸ç ±× tree Àº ´ÙÀ½°ú °°´Ù.

º´ÇÕÇÏ´Â cluster °£ÀÇ °Å¸® Àº vertical Ãà¿¡¼­ º¼ ¼ö ÀÖ´Ù.


The Complete-Linkage Algorithm

´Ù¸¥ À̸§À¸·Î´Â maximum method ¶Ç´Â farthest neighbor method ¶ó°íµµ ºÒ¸°´Ù. ¼­·Î ´Ù¸¥ cluster ¿¡ À§Ä¡ÇÏ´Â sample µé °£¿¡ °¡Àå Å« °Å¸®¸¦ µÎ cluster °£ÀÇ °Å¸®·Î Á¤ÀÇÇÏ¿© ±¸ÇÏ¿©Áø´Ù.  µÎ cluster ¿Í »çÀÌÀÇ °Å¸®´Â ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ.


Example 5.2  complete linkage algorithmÀ» »ç¿ëÇÑ hierarchical clustering

´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ. sample °£ÀÇ °Å¸® ÃøÁ¤À» À§ÇØ Euclidean distance¸¦ »ç¿ëÇ϶ó

´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé °£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.

 

 

 

1

2

3

4

5

1

2

3

4

5

4

8

15

24

24

4

4

8

4

12

 

1

2

3

4

5

-

4.0

11.7

20.0

21.5

4.0

-

8.1

16.0

17.9

11.7

8.1

-

9.8

9.8

20.0

16.0

9.8

-

8.0

21.5

17.9

9.8

8.0

-

ÇϳªÀÇ sampleÀ» Æ÷ÇÔÇÏ´Â ´Ù¼¸ °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â cluster {1} ¿Í {2} ´Â º´ÇÕµÇ¾î »õ·Î¿î cluster¸¦ ¸¸µç´Ù.

{1, 2}, {3}, {4}, {5}

ÀÌ cluster °£ÀÇ °Å¸®¸¦ ±¸ÇÑ Çà·ÄÀÌ ´ÙÀ½°ú °°´Ù.

 

{1, 2}

3

4

5

{1, 2}

3

4

5

-

11.7

20.0

21.5

11.7

-

9.8

9.8

20.0

9.8

-

8.0

21.5

9.8

8.0

-

Çà {1, 2} °ú ¿­ 3 À§Ä¡ÀÇ °ªÀº 11.7 À̸ç ÀÌ°ÍÀº cluster {1, 2} ¿Í {3} »çÀÌÀÇ °Å¸®¸¦ ÀǹÌÇÏ¸ç ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.

¿ø·¡ ÁÖ¾îÁø data¿¡¼­ ¿Í ÀÇ °ªÀÌ ÁÖ¾îÁö¸ç complete linkage ¾Ë°í¸®Áò¿¡¼­´Â ÃÖ´ë°ªÀÎ 11.7À» cluster °£ÀÇ °Å¸®¸¦ ¼±ÅÃÇÑ´Ù. ù ¹ø° ÇàÀÇ ´Ù¸¥ °ªµéµµ °°Àº ¹æ¹ýÀ¸·Î ±¸ÇØÁø´Ù. ù° Çà°ú «R° ¿­ ÀÌ¿ÜÀÇ °ªÀº ±×·¡µµ ½Â°èÇÑ´Ù. ±× ¶§ Çà·Ä¿¡¼­ÀÇ ÃÖ¼Ò°ªÀº 8 ÀÌ¸ç µû¶ó¼­ cluster {4} ¿Í {5} °¡ º´ÇÕÇÑ´Ù. À̶§ÀÇ cluster ´Â ´ÙÀ½°ú °°´Ù.

{1, 2}, {3}, {4, 5}

ÀÌ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½ Çà·Ä°ú °°ÀÌ ±¸ÇØÁø´Ù.

 

{1, 2}

3

{4, 5}

{1, 2}

3

{4, 5}

-

11.7

21.5

11.7

-

9.8

21.5

9.8

-

À§ Çà·ÄÀÇ ÃÖ¼Ò°ªÀº 9.8 À̱⠶§¹®¿¡ cluster {3} °ú {4, 5} °¡ º´ÇյǾî cluster´Â ´ÙÀ½°ú °°´Ù.

{1, 2}, {3, 4, 5}


À§ÀÇ cluster ´Â single linkage algorithm ÀÇ µ¿µîÇÑ À§Ä¡¿¡¼­ÀÇ ±¸ÇØÁø cluster ¿Í´Â ´Ù¸¥ °ÍÀ» ¾Ë ¼ö ÀÖ´Ù. ´ÙÀ½ °úÁ¤À¸·Î ³²¾ÆÀÖ´Â cluster µéÀÌ º´ÇյǾî hierarchical clustering ÀÌ ¿Ï¼ºµÇ¸ç ±× tree Àº ´ÙÀ½°ú °°´Ù.

The Average-Linkage Algorithm

single linkage algorithm Àº tree »ó¿¡¼­ cluster µéÀÌ ±æ°í °¡´Â(long and thin) ¸ð¾çÀ» º¸ÀÏ ¼ö ÀÖ´Ù. complete linkage algorithm ¿¡¼­´Â º¸´Ù compact ÇÑ ¸ð¾çÀ» º¸ÀδÙ. µÎ clustering ¹æ¹ý ¸ðµÎ ºñÁ¤»óÀûÀÎ °üÂû¿¡ ¹Î°¨ÇÏ¿© º¯ÇüµÉ ¼ö ÀÖ´Ù. µÎ ¾Ë°í¸®ÁòÀÇ ±Ø´ÜÀ» ÀýÃæÇϱâ À§ÇÑ ¹æ¹ýÀÌ  average linkage algorithm ÀÌ´Ù.

average linkage algorithm Àº ´Ù¸¥ À̸§À¸·Î´Â unweighted pair group method using arithmetic average (UPGMA) ¶ó°íµµ ºÒ¸®¿ì¸ç °¡Àå ³Î¸® »ç¿ëµÇ´Â hierarchical clustering algorithm Áß ÇϳªÀÌ´Ù. ¼­·Î ´Ù¸¥ cluster ¿¡ ¼ÓÇÑ µÎ Á¡ »çÀÌÀÇ Æò±Õ °Å¸®¸¦ µÎ cluster °£ÀÇ °Å¸®·Î Á¤ÀÇ ÇÔÀ¸·Î½á average linkage algorithm ÀÌ ¼öÇàµÈ´Ù. ¸¸ÀÏ cluster °¡ °³ÀÇ ¸â¹ö°¡ ÀÖ°í, cluster °¡ °³ÀÇ ¸â¹ö¸¦ °¡Áú °æ¿ì µÎ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½°ú °°´Ù.


Example 5.3  average linkage algorithmÀ» »ç¿ëÇÑ hierarchical clustering

´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ. sample °£ÀÇ °Å¸® ÃøÁ¤À» À§ÇØ Euclidean distance¸¦ »ç¿ëÇ϶ó

´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé °£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.

 

 

 

1

2

3

4

5

1

2

3

4

5

4

8

15

24

24

4

4

8

4

12

 

1

2

3

4

5

-

4.0

11.7

20.0

21.5

4.0

-

8.1

16.0

17.9

11.7

8.1

-

9.8

9.8

20.0

16.0

9.8

-

8.0

21.5

17.9

9.8

8.0

-

ÇϳªÀÇ sampleÀ» Æ÷ÇÔÇÏ´Â ´Ù¼¸ °³ÀÇ cluster ·Î ½ÃÀÛÇÑ´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â cluster {1} ¿Í {2} ´Â º´ÇÕµÇ¾î »õ·Î¿î cluster¸¦ ¸¸µç´Ù.

{1, 2}, {3}, {4}, {5}

ÀÌ cluster °£ÀÇ °Å¸®¸¦ ±¸ÇÑ Çà·ÄÀÌ ´ÙÀ½°ú °°´Ù. 

 

{1, 2}

3

4

5

{1, 2}

3

4

5

-

9.9

18

19.7

9.9

-

9.8

9.8

18.0

9.8

-

8.0

19.7

9.8

8.0

-

Çà {1, 2} °ú ¿­ 3 À§Ä¡ÀÇ °ªÀº 9.9 À̸ç ÀÌ°ÍÀº cluster {1, 2} ¿Í {3} »çÀÌÀÇ °Å¸®¸¦ ÀǹÌÇÏ¸ç ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.

¿ø·¡ ÁÖ¾îÁø data¿¡¼­ ¿Í ÀÇ °ªÀÌ ÁÖ¾îÁö¸ç average linkage ¾Ë°í¸®Áò¿¡¼­´Â Æò±Õ°ªÀÎ 9.9¸¦ cluster °£ÀÇ °Å¸®·Î ¼±ÅÃÇÑ´Ù. ù ¹ø° ÇàÀÇ ´Ù¸¥ °ªµéµµ °°Àº ¹æ¹ýÀ¸·Î ±¸ÇØÁø´Ù. ù° Çà°ú «R° ¿­ ÀÌ¿ÜÀÇ °ªÀº ±×·¡µµ ½Â°èÇÑ´Ù. ±× ¶§ Çà·Ä¿¡¼­ÀÇ ÃÖ¼Ò°ªÀº 8 ÀÌ¸ç µû¶ó¼­ cluster {4} ¿Í {5} °¡ º´ÇÕÇÑ´Ù. À̶§ÀÇ cluster ´Â ´ÙÀ½°ú °°´Ù.

{1, 2}, {3}, {4, 5}

ÀÌ cluster °£ÀÇ °Å¸®´Â ´ÙÀ½ Çà·Ä°ú °°ÀÌ ±¸ÇØÁø´Ù.

 

{1, 2}

3

{4, 5}

{1, 2}

3

{4, 5}

-

9.9

18.9

9.9

-

9.8

18.9

9.8

-

À§ Çà·ÄÀÇ ÃÖ¼Ò°ªÀº 9.8 À̱⠶§¹®¿¡ cluster {3} °ú {4, 5} °¡ º´ÇյǾî cluster´Â ´ÙÀ½°ú °°´Ù.

{1, 2}, {3, 4, 5}

´ÙÀ½ °úÁ¤À¸·Î ³²¾ÆÀÖ´Â cluster µéÀÌ º´ÇյǾî hierarchical clustering ÀÌ ¿Ï¼ºµÈ´Ù. 


Ward's Method

´Ù¸¥ À̸§À¸·Î´Â minimum variance method ¶ó°íµµ ÇÑ´Ù. ´Ù¸¥ ¾Ë°í¸®Áò ó·³ °¢ sampleÀ» À§ÇÑ ÇϳªÀÇ cluster ·Î½á ½ÃÀÛÇÑ´Ù. ¸ðµç cluster ½Ö »çÀÌ¿¡¼­ ¹Ýº¹À» ÅëÇØ °¡Àå ÀÛÀº squared error¸¦ °¡Áö´Â ½ÖÀ» º´ÇÕÇÏ¿© »õ·Î¿î cluster¸¦ ¸¸µç´Ù. °¢ cluster¸¦ À§ÇÑ squared error´Â ´ÙÀ½°ú °°ÀÌ Á¤ÀǵȴÙ.

ÇϳªÀÇ cluster °¡ °³ÀÇ sample ( , ¿©±â¼­ ´Â feature vector ) À» Æ÷ÇÔÇÑ´Ù¸é, sample ÀÇ squared error (Æò±Õ¿¡¼­ÀÇ squared Euclidean distance)´Â ´ÙÀ½°ú °°´Ù.

¿©±â¼­ ´Â cluster¿¡¼­ sampleµéÀ» À§ÇÑ feature ÀÇ Æò±Õ°ªÀÌ´Ù.

Àüü cluster ¿¡ ´ëÇÑ squared error ´Â sample µéÀÇ squared error µéÀÇ ÇÕÀÌ´Ù.

°¢ feature ÀÇ Æò±Õ°ªÀ¸·Î ±¸¼ºµÈ vector Àº ±× cluster ÀÇ mean vector ¶Ç´Â centroid ¶ó°í ºÒ¸®¿î´Ù. ÇϳªÀÇ cluster¸¦ À§ÇÑ squared error ´Â °¢ feature¿¡¼­ cluster ¸â¹ö·ÎºÎÅÍ ±×µéÀÇ Æò±Õ°ª±îÁöÀÇ squared distance ÀÇ ÇÕÀÌ´Ù. squared error´Â cluster ÀÇ total variance ¿¡´Ù°¡ cluster ÀÇ sampleÀÇ ¼ö À» °öÇÑ °Í°ú °°´Ù. ¿©±â¼­ total variance´Â °¢ feature ÀÇ variance ÀÇ ÇÕÀ¸·Î½á ·Î Á¤ÀǵȴÙ. ÀÏ·ÃÀÇ clusterµéÀ» À§ÇÑ squared error ´Â °¢ clusterµéÀ» À§ÇÑ squared error µéÀÇ ÇÕÀ¸·Î Á¤ÀǵȴÙ.


Example 5.4  Ward's method¸¦ »ç¿ëÇÑ hierarchical clustering

´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ.

´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature °ª°ú sample ½Öµé °£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.

 

 

 

1

2

3

4

5

1

2

3

4

5

4

8

15

24

24

4

4

8

4

12

 

1

2

3

4

5

-

4.0

11.7

20.0

21.5

4.0

-

8.1

16.0

17.9

11.7

8.1

-

9.8

9.8

20.0

16.0

9.8

-

8.0

21.5

17.9

9.8

8.0

-

À̶§ squared error ´Â zero ÀÌ´Ù. cluster ½ÖÀ» º´ÇÕÇÏ´Â ¹æ¹ýÀº 10 °¡Áö°¡ ÀÖ´Ù. cluster {1} °ú {2} º´ÇÕ, {1} °ú {3} º´ÇÕµîµî.

´ÙÀ½ Ç¥´Â °¡´ÉÇÑ ¹æ¹ýÀÇ squared error¸¦ º¸¿©ÁØ´Ù. ¿¹¸¦µé¸é cluster {1} ¿Í {2}¸¦ º´ÇÕÇÑ´Ù°í ÇÏÀÚ. sample 1 ÀÇ feature vector ´Â (4,4) ÀÌ°í sample 2 ´Â (8,4) ÀÌ¸ç µû¶ó¼­ feature Æò±ÕÀº 6 °ú 4 ÀÌ´Ù. cluster {1, 2}¸¦ À§ÇÑ squared error ´Â ´ÙÀ½°ú °°ÀÌ ±¸ÇØÁø´Ù.

(4 - 6)2 + (8 - 6)2 + (4 - 4)2 + (4 - 4)2 = 8

´Ù¸¥ °¢°¢ÀÇ cluster {3}, {4}, {5} ÀÇ squared error ´Â 0 ÀÌ´Ù. µû¶ó¼­ cluster {1, 2}, {3}, {4}, {5} ÀÇ total squared error ´Â ´ÙÀ½°ú °°´Ù.

8 + 0 + 0 + 0 = 8

´ÙÀ½ Ç¥¿¡¼­ °¡Àå ÀÛÀº squared error ´Â 8 À̱⠶§¹®¿¡ cluster {1} ¿Í {2} ´Â º´ÇյǾî 4 °³ÀÇ cluster {1, 2}, {3}, {4}, {5}¸¦ ¸¸µç´Ù.

Clusters

Squared
Error,

{1, 2}, {3}, {4}, {5}

{1, 3}, {2}, {4}, {5}

{1, 4}, {2}, {3}, {5}

{1, 5}, {2}, {3}, {4}

{2, 3}, {1}, {4}, {5}

{2, 4}, {1}, {3}, {5}

{2, 5}, {1}, {3}, {5}

{3, 4}, {1}, {2}, {5}

{3, 5}, {1}, {2}, {4}

{4, 5}, {1}, {2}, {3}

8.0

68.5

200.0

232.0

32.5

128.0

160.0

48.5

48.5

32.0

4 °³ÀÇ cluster¸¦ ¸¸µé °æ¿ìÀÇ °¢°¢ÀÇ squared error

´ÙÀ½ Ç¥¿¡¼­´Â {1, 2}, {3}, {4}, {5} Áß¿¡¼­ 2 °³¸¦ º´ÇÕÇÏ´Â °¡´ÉÇÑ °æ¿ìÀÇ squared error¸¦ º¸¿©ÁØ´Ù.  ´ÙÀ½Ç¥¿¡¼­ °¡Àå ÀÛÀº squared error ´Â 40 À̱⠶§¹®¿¡ cluster {4} ¿Í {5} °¡ º´ÇյǾî 3 °³ÀÇ cluster¸¦ ¸¸µç´Ù.

{1, 2}, {3}, {4, 5}

Clusters

Squared
Error,

{1, 2, 3}, {4}, {5}

{1, 2, 4}, {3}, {5}

{1, 2, 5}, {3}, {4}

{1, 2}, {3, 4}, {5}

{1, 2}, {3, 5}, {4}

{1, 2}, {4, 5}, {3}

72.7

224.0

266.7

56.5

56.5

40.0

3 °³ÀÇ cluster¸¦ ¸¸µé °æ¿ìÀÇ °¢°¢ÀÇ squared error

´ÙÀ½ Ç¥¿¡¼­´Â {1, 2}, {3}, {4, 5} Áß¿¡¼­ 2 °³¸¦ º´ÇÕÇÏ´Â °¡´ÉÇÑ °æ¿ìÀÇ squared error¸¦ º¸¿©ÁØ´Ù.  ´ÙÀ½Ç¥¿¡¼­ °¡Àå ÀÛÀº squared error ´Â 94 À̱⠶§¹®¿¡ cluster {3} °ú {4, 5} °¡ º´ÇյǾî 2 °³ÀÇ cluster¸¦ ¸¸µç´Ù.

{1, 2}, {3, 4, 5} 

Clusters

Squared
Error,

{1, 2, 3}, {4, 5}

{1, 2, 4, 5}, {3}

{1, 2}, {3, 4, 5}

104.7

380.0

94.0

2 °³ÀÇ cluster¸¦ ¸¸µé °æ¿ìÀÇ °¢°¢ÀÇ squared error

´ÙÀ½À¸·Î ³²Àº 2 °³ÀÇ cluster °¡ º´ÇյǾî hierarchical clustering ÀÌ ¿Ï¼ºµÈ´Ù. ´ÙÀ½ ±×¸²Àº ¿Ï¼ºµÈ tree¸¦ º¸¿©ÁØ´Ù.

Ward's ¹æ¹ý¿¡ ÀÇÇÑ Æ®¸®


5.3  Partitional Clustering

cluster  ÀÇ °èÃþÀ» °í·ÁÇÏÁö ¾Ê°í Æò¸éÀûÀ¸·Î clustering ÇÏ´Â ¹æ¹ýÀ¸·Î ÀϹÝÀûÀ¸·Î ¹Ì¸® ¸î °³ÀÇ cluster ·Î ³ª´©¾î Áú °ÍÀ̶ó°í ¿¹»óÇÏ°í cluster ÀÇ °³¼ö¸¦ Á¤ÇÏ´Â °ÍÀÌ´Ù.  

Forgy's Algorithm

°¡Àå °£´ÜÇÑ partitional clustering algorithm ÁßÀÇ Çϳª°¡ Forgy's algorithm ÀÌ´Ù. data ÀÌ¿Ü¿¡ cluster ÀÇ ¼ö ¸¦ input À¸·Î Çϸç À̶§ ¸¦ seed point ¶ó°í ÇÑ´Ù. seed point ´Â ÀÓÀÇ·Î ¼±ÅÃµÇ¸ç ¹Ù¶÷Á÷ÇÑ cluster ±¸Á¶¿¡ °üÇÑ ¾î¶² Áö½ÄµéÀÌ seed point¸¦ ¼±ÅÃÇϴµ¥¿¡ »ç¿ëµÉ ¼ö ÀÖ´Ù. ±× °úÁ¤Àº ´ÙÀ½°ú °°´Ù.

1. ÀÓÀÇÀÇ °¹¼öÀÇ seed point¸¦ cluster centroid ·Î¼­ ÃʱâÈ­ ÇÑ´Ù.

2. °¢ sample ¿¡ ´ëÇØ °¡Àå °¡±îÀÌ ÀÖ´Â cluster centroid¸¦ ã¾Æ¼­ ÇØ´ç cluster ¿¡ sampleÀ» ¹èÁ¤ÇÑ´Ù.

3. ¸¸ÀÏ step 2¿¡¼­ sample ÀÌ cluster¸¦ º¯È­½ÃÅ°Áö ¸øÇϸé Á¾·áÇÑ´Ù.

4. º¯È­µÈ cluster µé¿¡ ´ëÇÑ centroid¸¦ ´Ù½Ã °è»êÇؼ­ ´Ù½Ã step 2 ·Î °£´Ù.


 Example 5.5  Forgy's algorithmÀ» »ç¿ëÇÑ partitional clustering

´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ.

´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature , °ª°ú sample ½Öµé °£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.

 

 

 

1

2

3

4

5

1

2

3

4

5

4

8

15

24

24

4

4

8

4

12

 

1

2

3

4

5

-

4.0

11.7

20.0

21.5

4.0

-

8.1

16.0

17.9

11.7

8.1

-

9.8

9.8

20.0

16.0

9.8

-

8.0

21.5

17.9

9.8

8.0

-

step 1, óÀ½¿¡ 2 °³ÀÇ cluster¸¦ ·Î µÎ°í seed point ·Î¼­ °¢ cluster ´Â ÃÖÃÊÀÇ µÎ sample (4,4), (8,4)¸¦ »ç¿ëÇÑ´Ù. Forgy's algorithm¿¡¼­´Â °è»ê¿¡¼­ÀÇ ÆíÀǸ¦ À§ÇØ sampleµéÀÇ ¼öº¸´Ù´Â feature vector ·Î¼­ sample µéÀ» Ç¥±âÇÑ´Ù.

step 2, °¢ sample¿¡¼­ °¡Àå °¡±îÀÌ ÀÖ´Â cluster centroid¸¦ ã´Â´Ù. ´ÙÀ½ Ç¥¿¡¼­´Â ±× °á°ú¸¦ º¸¿©ÁØ´Ù. cluster 2 °³ {(4, 4)} ¿Í {(8, 4), (15, 8), (24, 4), (24, 12)} °¡ ¸¸µé¾î Á³´Ù.

step 4, cluster µéÀ» À§ÇÑ centroid¸¦ °è»êÇÑ´Ù. ù ¹ø° cluster ÀÇ centroid ´Â (4,4) ÀÌ¸ç µÎ ¹ø°´Â (17.75, 7) ÀÌ¸ç ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ °è»êµÇ¾ú´Ù.

(8 + 15 + 24 + 24) / 4 = 17.75

(4 + 8 + 4 + 12) / 4 = 7

sampleµéÀÌ cluster µéÀ» º¯È­½ÃÄ×±â (óÀ½¿¡´Â cluster °¡ ¾ø¾úÀ¸´Ï±î) ¶§¹®¿¡ step 2 ·Î µ¹¾Æ°£´Ù.

Sample

Nearest
Cluster Centroid

(4, 4)

(8, 4)

(15, 8)

(24, 4)

(24, 12)

(4, 4)

(8, 4)

(8, 4)

(8, 4)

(8, 4)

Forgy's algorithm ÀÇ Ã¹ ¹ø° ¹Ýº¹

´ÙÀ½ Ç¥¿¡¼­´Â °¢ sample µé¿¡ °¡Àå °¡±î¿î cluster centroid¸¦ ã´Â´Ù. cluster {(4, 4), (8, 4)} ¿Í  {(15, 8), (24, 4), (24, 12)} ÀÌ ¸¸µé¾î Á³´Ù.

step 4, cluster ÀÇ centroid (6, 4) ¿Í (21, 8) ÀÌ °è»êµÈ´Ù. sample (8, 4) °¡ cluster µéÀ» º¯È­½ÃÄ×°í µû¶ó¼­ step 2 ·Î °£´Ù.

Sample

Nearest
Cluster Centroid

(4, 4)

(8, 4)

(15, 8)

(24, 4)

(24, 12)

(4, 4)

(4, 4)

(17.75, 7)

(17.75, 7)

(17.75, 7)

Forgy's algorithm ÀÇ µÎ ¹ø° ¹Ýº¹

´ÙÀ½ Ç¥¿¡¼­´Â °¢ sample µé¿¡ °¡Àå °¡±î¿î cluster centroid¸¦ ã´Â´Ù. cluster {(4, 4), (8, 4)} ¿Í  {(15, 8), (24, 4), (24, 12)} ÀÌ ¸¸µé¾î Á³´Ù.

step 4, cluster ÀÇ centroid (6, 4) ¿Í (21, 8) ÀÌ °è»êµÈ´Ù. ¾î¶² sample µµ cluster µéÀ» º¯È­½ÃÅ°Áö ¾Ê¾Ò±â ¶§¹®¿¡ ¾Ë°í¸®ÁòÀº Á¾·áµÈ´Ù.

Sample

Nearest
Cluster Centroid

(4, 4)

(8, 4)

(15, 8)

(24, 4)

(24, 12)

(6, 4)

(6, 4)

(21, 8)

(21, 8)

(21, 8)

Forgy's algorithm ÀÇ ¼¼ ¹ø° ¹Ýº¹


À§ ¿¹Á¦¿¡¼­´Â seed point¸¦ óÀ½ÀÇ µÎ sampleµé·Î ÀÓÀÇ·Î ¼±ÅõǾú´Ù. ±×·¯³ª ´Ù¸¥ °¡´É¼ºÀÌ Á¦¾ÈµÉ ¼ö ÀÖ´Ù. ÇϳªÀÇ °¡´É¼ºÀº hierarchical clustering algorithm ÁßÀÇ Çϳª·Î ¸¸µé¾îÁø cluster ·Î ½ÃÀÛÇÏ¿© ÃÖÃÊÀÇ seed point ·Î¼­ ±× centroid¸¦ »ç¿ëÇÏ´Â °ÍÀÌ´Ù.

The k-means Algorithm

Forgy's algorithm °ú À¯»çÇÑ ¹æ¹ýÀÌ´Ù. data ÀÌ¿Ü¿¡ cluster ÀÇ ¼ö ¸¦ input À¸·Î Çϸç À̶§ ¸¦ seed point ¶ó°í ÇÑ´Ù. Forgy' algorithm °ú ´Ù¸¥Á¡Àº ÇϳªÀÇ sample ÀÌ ÇϳªÀÇ cluster ¿¡ ÇÕ·ùÇÏÀÚ¸¶ÀÚ °ð cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù´Â °ÍÀÌ´Ù. ¶ÇÇÑ Forgy' algorithm ÀÌ ¹Ýº¹Àû(iterative) ÇÑ ¹Ý¸é¿¡ -means algorithm Àº data set¿¡¼­ ´ÜÁö µÎ ¹ø¸¸ÀÇ pass °¡ ÀÌ·ç¾îÁø´Ù. ±× °úÁ¤Àº ´ÙÀ½°ú °°´Ù.

1. óÀ½¿¡ cluster ·Î¼­ ½ÃÀÛÇÑ´Ù. ³²¾ÆÀÖ´Â  sampleµé¿¡ ´ëÇؼ­´Â °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. ÀÌ°Í¿¡ °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áö´Â °ÍÀÌ È®ÀÎµÈ cluster ¿¡ sampleÀ» Æ÷ÇÔ½ÃŲ´Ù. °¢°¢ÀÇ sample µéÀÌ ÇÒ´çµÈ ÈÄ¿¡ ÇÒ´çµÈ cluster ÀÇ centroid °¡ ´Ù½Ã °è»êµÈ´Ù.

2. ±× data¸¦ µÎ ¹ø ó¸®ÇÑ´Ù. °¢ sample¿¡ ´ëÇÏ¿© °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã´Â´Ù. °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áø °ÍÀ¸·Î È®ÀÎµÈ cluster ¿¡ sampleÀ» À§Ä¡½ÃŲ´Ù. (ÀÌ step ¿¡¼­´Â ¾î¶² centroid µµ ´Ù½Ã °è»êÇÏÁö ¾Ê´Â´Ù.)


Example 5.6  -means algorithmÀ» »ç¿ëÇÑ partitional clustering

´ÙÀ½ ±×¸²°ú °°ÀÌ data °¡ ºÐÆ÷ÇÑ´Ù°í ÇÏÀÚ.

´ÙÀ½ Ç¥´Â °¢ sample ÀÇ feature , °ª°ú sample ½Öµé °£ÀÇ °Å¸® ¸¦ º¸¿©ÁØ´Ù.

 

 

 

1

2

3

4

5

1

2

3

4

5

4

8

15

24

24

4

4

8

4

12

 

1

2

3

4

5

-

4.0

11.7

20.0

21.5

4.0

-

8.1

16.0

17.9

11.7

8.1

-

9.8

9.8

20.0

16.0

9.8

-

8.0

21.5

17.9

9.8

8.0

-

 Ã³À½¿¡ 2 °³ÀÇ cluster¸¦ ·Î µÎ°í óÀ½ÀÇ µÎ sample µéÀº (8,4) ¿Í (24,4) ·Î ÇÑ´Ù.

step 1

µÎ °³ÀÇ cluster {(8, 4} ¿Í {(24, 4)} ·Î ½ÃÀÛÇÏ¸ç ±×°ÍÀº (8, 4) ¿Í (24, 4) ¿¡ centroid¸¦ °®´Â´Ù. ³ª¸ÓÁö 3 °³ÀÇ sample °¢°¢¿¡ ´ëÇؼ­´Â °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ ã°í ±× cluster ¿¡ sampleÀ» µÎ°í cluster ÀÇ centroid¸¦ ´Ù½Ã °è»êÇÑ´Ù.

´ÙÀ½ sample (15, 8) Àº centroid (8, 4) ¿¡ °¡Àå °¡±õ°í µû¶ó¼­ cluster {(8, 4)} ¿¡ ÇÕ·ùÇÑ´Ù. À̶§¿¡ cluster µéÀº {(8, 4), (15, 8)} ¿Í {(24, 4)} ÀÌ´Ù. ù ¹ø° cluster ÀÇ centroid ´Â (11.5, 6) ·Î ¹Ù²î´Âµ¥ ±×°ÍÀº ´ÙÀ½°ú °°ÀÌ °è»êµÈ °ÍÀÌ´Ù.

(8 + 15) / 2 = 11.5,       (4 + 8) / 2 = 6

´ÙÀ½ sample (4, 4) Àº centroid (11.5, 6) ¿¡ °¡Àå °¡±õ°í µû¶ó¼­ cluster {(8, 4), (15, 8)} ¿¡ ÇÕ·ùÇÑ´Ù. À̶§¿¡ cluster µéÀº {(8, 4), (15, 8), (4, 4)} ¿Í {(24, 4)} ÀÌ´Ù. ù ¹ø° cluster ÀÇ centroid ´Â (9, 5.3) ·Î ¹Ù²ï´Ù.

´ÙÀ½ sample (24, 12) Àº centroid (24, 4) ¿¡ °¡Àå °¡±õ°í µû¶ó¼­ cluster {(24, 4)} ¿¡ ÇÕ·ùÇÑ´Ù. À̶§¿¡ cluster µéÀº {(8, 4), (15, 8), (4, 4)} ¿Í {(24, 12), (24, 4)} ÀÌ´Ù. À̶§ µÎ ¹ø° cluster ÀÇ centroid ´Â (24, 8) ·Î ¹Ù²ï´Ù. À̶§¿¡ ¾Ë°í¸®ÁòÀÇ step 1 Àº ¿Ï¼ºµÈ´Ù.

step 2

sampleµéÀ» Çϳª¾¿ °Ë»çÇÏ°í °¡Àå °¡±îÀÌ ÀÖ´Â centroid¸¦ °¡Áö´Â °ÍÀ¸·Î È®ÀÎµÈ cluster ¿¡ °¢ sampleÀ» À§Ä¡½ÃŲ´Ù. ´ÙÀ½ Ç¥¿¡¼­ º¸¿©Áö´Â °Íó·³ sample µéÀÌ cluster µéÀ» º¯È­½ÃÅ°Áö ¸øÇÒ °æ¿ì¿¡ ÃÖÁ¾ÀûÀ¸·Î ´ÙÀ½ÀÇ cluster µé·Î ºÐ·ùµÈ´Ù.

{(8, 4), (15, 8), (4, 4)} ¿Í {(24, 12), (24, 4)}

Sample

Distance to
Centroid (9, 5.3)

Distance to
Centroid (24, 8)

(8, 4)

(24, 4)

(15, 8)

(4, 4)

(24, 12)

1.6

15.1

6.6

6.6

16.4

16.5

4.0

9.0

40.4

4.0

-means algorithm ÀÇ step 2¿¡¼­ »ç¿ëÇϱâ À§ÇÑ distance


-means algorithm (8,  (24, 12)

{(4, 4), (8, 4), (15, 8)}, {(24, 4), (24, 12)}

The Isodata Algorithm

Isodata Algorithm Àº Forgy's algorithm °ú -means algorithm À» º¸°­ÇÑ ¹æ¹ýÀ¸·Î »ý°¢ÇÒ ¼ö ÀÖ´Ù. °°Àº Á¡Àº °¡Àå °¡±îÀÌ ÀÖ´Â centroid ¿¡ sampleµéÀ» ÇÒ´çÇÏ¿© squared error¸¦ ÃÖ¼ÒÈ­ ½ÃŲ´Ù´Â °ÍÀÌ´Ù. ´Ù¸¥Á¡Àº °íÁ¤µÈ ¼öÀÇ cluster µéÀ» ó¸®ÇÏ´Â °ÍÀÌ ¾Æ´Ï¶ó, »ç¿ëÀÚ¿¡ ÀÇÇØ ¿ä±¸µÇ´Â cluster ÀÇ ¼ö¸¦ Æ÷ÇÔÇÏ´Â ¹üÀ§±îÁö Çã¿ëµÇ´Â °³ÀÇ cluster¸¦ ´Ù·é´Ù´Â °ÍÀÌ´Ù. ¸¸ÀÏ cluster µéÀÇ ¼ö°¡ ³Ê¹« ¸¹¾ÆÁö°Å³ª cluster µéÀÌ ³Ê¹« °¡±îÀÌ ÀÖ°Ô µÇ¸é cluster ´Â º´ÇյȴÙ. ¸¸ÀÏ cluster µéÀÇ ¼ö°¡ ³Ê¹« Àû°Å³ª cluster °¡ ¾ÆÁÖ ´Ù¸¥ Á¾·ùÀÇ sample µéÀ» Æ÷ÇÔÇÏ°í ÀÖ´Ù¸é cluster ´Â ºÐ¸®µÈ´Ù. ÀÚ¼¼ÇÑ °ÍÀº ´ÙÀ½¿¡ ¼­¼úÇÑ´Ù.

isodata algorithm ÀÇ °æ¿ì data ¿Í seed point ÀÌ¿Ü¿¡ ´ÙÀ½ÀÇ parameter µéÀÌ ÇÊ¿äÇÏ´Ù. 

no_clusters : cluster ÀÇ ¹Ù¶÷Á÷ÇÑ ¼ö·Î¼­ seed point ÀÇ ¼ö¿Í °°´Ù.

min_elements : °¢ cluster ¸¶´Ù Çã¿ëµÇ´Â sample µéÀÇ ÃÖ¼Ò °¹¼ö.

min_dist : º´ÇÕÀÌ ÀϾÁö ¾Ê´Â, cluster centroid »çÀÌ¿¡ Çã¿ëµÇ´Â ÃÖ¼Ò °Å¸®

split_size : cluster ÀÇ ºÐ¸®¸¦ Á¶ÀýÇÏ´Â parameter

iter_start : ¾Ë°í¸®ÁòÀÇ first part¿¡¼­ ¹Ýº¹(iteration) ÀÇ ÃÖ´ë¼ö

max_merge : °¢ ¹Ýº¹(iteration) ¿¡¼­ÀÇ cluster º´ÇÕÀÇ ÃÖ´ë¼ö

iter_body : ¾Ë°í¸®ÁòÀÇ main part ³»¿¡¼­ ¹Ýº¹ÀÇ ÃÖ´ë¼ö

ÀÌ·¯ÇÑ parameter µéÀº ¾Ë°í¸®Áò °úÁ¤¿¡¼­ ÀÚ¼¼È÷ ¼³¸íµÈ´Ù.

Isodata Algorithm ÀÇ °úÁ¤Àº ´ÙÀ½°ú °°´Ù.

1. ÀÓÀÇÀÇ °¹¼öÀÇ seed point¸¦ cluster centroid ·Î¼­ ÃʱâÈ­ ÇÑ´Ù.(step 1¿¡¼­ 4 ±îÁö´Â Forgy's algorithm °ú °°´Ù)

2. °¢ sample ¿¡ ´ëÇØ °¡Àå °¡±îÀÌ ÀÖ´Â cluster centroid¸¦ ã¾Æ¼­ ÇØ´ç cluster ¿¡ sampleÀ» ¹èÁ¤ÇÑ´Ù.

3. º¯È­µÈ cluster ÀÇ centroid¸¦ °è»êÇÑ´Ù..

4. ¸¸ÀÏ Àû¾îµµ ÇϳªÀÇ sample ÀÌ cluster¸¦ º¯È­½ÃÅ°°í ¹Ýº¹µÇ´Â ¼ö°¡ iter_start º¸´Ù ÀÛ´Ù¸é, step 2 ·Î °£´Ù.

5. min_elements º¸´Ù ÀûÀº¼öÀÇ sampleÀ» °¡Áø cluster ´Â Æó±âÇÑ´Ù. ¶ÇÇÑ Æ÷ÇÔµÈ sample µéµµ Æó±âÇÑ´Ù.

6. ¸¸ÀÏ cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù Å©°Å³ª °°°í, ¶Ç´Â ¹Ýº¹µÇ´Â ¼ö°¡ ¦¼ö(even)À̶ó¸é step 7 (º´ÇÕ µ¿ÀÛ)À» ½ÇÇàÇÏ°í ±×·¸Áö ¾ÊÀ¸¸é step 8 ·Î °£´Ù.

7. ¸¸ÀÏ µÎ centroid °£ÀÇ °Å¸®°¡ min_dist º¸´Ù ÀÛ´Ù¸é µÎ cluster¸¦ º´ÇÕÇÏ°í centroid¸¦ º¯È­½ÃŲ´Ù. ±×·¸Áö ¾ÊÀ¸¸é step 7 ·Î °£´Ù. ÀÌ·¯ÇÑ stepÀ» max_merge ¹øÀ» ¹Ýº¹ÇÏ°í step 8 ·Î °£´Ù.

8. ¸¸ÀÏ cluster ÀÇ ¼ö°¡ no_clusters / 2 º¸´Ù À۰ųª °°°í, ¶Ç´Â ¹Ýº¹ÀÇ ¼ö°¡ Ȧ¼ö(odd) ¶ó¸é step 9 (ºÐ¸® µ¿ÀÛ)À» ¼öÇàÇÏ°í ±×·¸Áö ¾ÊÀ¸¸é step 10 À¸·Î °£´Ù.

9. split_size * ¸¦ ÃÊ°úÇϴ ǥÁØÆíÂ÷¸¦ °¡Áö´Â cluster¸¦ ã´Â´Ù (¿©±â¼­ ´Â ¾î¶² º¯¼öÀÌ°í ´Â ¿ø·¡ÀÇ sample ÁýÇÕ¿¡¼­ ÀÇ Ç¥ÁØÆíÂ÷ÀÌ´Ù). ¸¸ÀÏ ¾øÀ¸¸é step 10 À¸·Î °£´Ù. cluster ³»¿¡¼­ ÀÇ Æò±ÕÀ» °è»êÇÑ´Ù. ÀÌ cluster ¿¡ ÀÖ´Â sample µéÀ» µÎ ÁýÇÕÀ¸·Î ³ª´«´Ù( °¡ Æò±Õº¸´Ù Å©°Å³ª °°Àº ÁýÇÕ°ú °¡ Æò±Õº¸´Ù ÀÛÀº ÁýÇÕ). ÀÌ·¯ÇÑ µÎ cluster ÀÇ centroid¸¦ °è»êÇÑ´Ù. ¸¸ÀÏ ÀÌ·¯ÇÑ centroid »çÀÌÀÇ °Å¸®°¡ 1.1 * min_dist º¸´Ù Å©°Å³ª °°À¸¸é ¿ø·¡ÀÇ cluster¸¦ ÀÌ·¯ÇÑ µÎ °³ÀÇ cluster ·Î ¹Ù²Ù°í, ±×·¸Áö ¾ÊÀ¸¸é cluster¸¦ ºÐ¸®ÇÏÁö ¾Ê´Â´Ù.

10. ¸¸ÀÏ step 10 ÀÌ iter_body Ƚ¼ö¸¸Å­ ¼öÇàµÇ°Å³ª, ¸¶Áö¸· step 10 ÀÌ ¼öÇàµÈ ÀÌÈÄ cluster ¿¡ ¾Æ¹«·± º¯È­µµ ¹ß»ýÇÏÁö ¾ÊÀ¸¸é Áß´ÜÇÑ´Ù. ±×·¸Áö ¾ÊÀ¸¸é »õ·Î¿î seed point ·Î¼­ cluster ÀÇ centroid¸¦ ÃëÇÏ°í step 2 ·Î °£´Ù.


Example 5.7   isodata algorithmÀ» »ç¿ëÇÑ partitional clustering

´ÙÀ½°ú °°Àº data ¿Í parameter µéÀÌ ÀÖÀ» °æ¿ì data¸¦ cluster ÇϱâÀ§ÇØ isodata algorithmÀ» »ç¿ëÇØ º¸ÀÚ.

Number

x

y

Number

x

y

1

2

3

4

5

6

7

0.0

0.0

0.0

0.5

0.5

1.0

1.0

0.0

1.0

3.0

0.5

3.5

1.0

3.0

8

9

10

11

12

13

14

6.0

6.0

6.0

6.0

6.2

6.2

8.0

0.75

1.00

2.00

2.10

0.80

2.05

12.0

no_clusters =
min_elements =
min_dist =
split_size =
iter_start =
max_merge =
iter_body =

3
2
3
0.2
5
1
5

seed point ´Â sample 1, 3, 13 À̸ç, óÀ½ÀÇ ¹Ýº¹ Ƚ¼ö´Â 0 ÀÌ´Ù.

¾Ë°í¸®ÁòÀÇ Forgy part (step 1 ~4) ÀÇ ¼ö·ÅÀ» À§ÇØ data ÀÇ ÇѹøÀÇ ¼öÇàÀÌ ÇÊ¿äÇÏ´Ù. À̶§¿¡´Â ´ÙÀ½°ú °°ÀÌ 3 °³ÀÇ cluster °¡ Á¸ÀçÇÑ´Ù.

{1, 2, 4, 6}, {3, 5, 7}, {8, 9, 10, 12, 13, 14}

step 5, ¾î¶² cluster µµ min_elements º¸´Ù ¸â¹ö¼ö°¡ ÀÛÁö ¾Ê±â ¶§¹®¿¡ Æó±âµÇ´Â cluster ´Â ¾ø´Ù.

step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù Å©°Å³ª °°Áö ¾Ê°í, ¶Ç´Â ¹Ýº¹µÇ´Â ¼ö (0) °¡ ¦¼ö(even)À̹ǷÎ, step 7 (º´ÇÕ µ¿ÀÛ)À» ½ÇÇàÇÑ´Ù. ÇÏ°í ±×·¸Áö ¾ÊÀ¸¸é step 8 ·Î °£´Ù.

cluster {1, 2, 4, 6} ¿Í {3, 5, 7} ÀÇ centroid °£ÀÇ °Å¸®°¡ min_dist º¸´Ù À۱⠶§¹®¿¡ µÎ cluster¸¦ º´ÇÕÇÏ°í centroid¸¦ º¯È­½ÃŲ´Ù. À̶§¿¡ 2 °³ÀÇ cluster °¡ ´ÙÀ½°ú °°ÀÌ µÈ´Ù.

{1, 2, 3, 4, 5, 6, 7},   {8, 9, 10, 11, 12, 13, 14}

º´ÇÕ step ÀÌ ¹Ýº¹µÇÁö ¾ÊÀ¸¹Ç·Î (max_merge = 1 À̱⠶§¹®) step 8 ·Î ³ª¾Æ°£´Ù. ( ÀÌ °æ¿ì ³²¾ÆÀÖ´Â cluster µéÀº º´ÇÕµÉ ¼ö ¾ø´Ù. ¿Ö³ÄÇϸé centroid °£ÀÇ °Å¸®°¡ min_dist º¸´Ù Å©±â ¶§¹®ÀÌ´Ù)

step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 (1.5) º¸´Ù Å©±â ¶§¹®¿¡, ¶ÇÇÑ ¹Ýº¹ÀÇ ¼ö°¡ Ȧ¼ö(odd) °¡ ¾Æ´Ï±â ¶§¹®¿¡ step 10 À¸·Î °£´Ù.

step 10, ¹Ýº¹ÀÇ È½¼ö°¡ ¿ä±¸µÇ´Â ¼ö (5) º¸´Ù ÀÚ°í cluster µéÀº º¯È­µÇ¾ú´Ù. µû¶ó¼­ step 2 ·Î ³ª¾Æ°£´Ù.

À̶§¿¡ ¾Ë°í¸®ÁòÀÇ Forgy part (step 1~4) ´Â cluster¸¦ º¯È­½ÃÅ°Áö ¾Ê´Â´Ù.

step 5, ¾î¶² cluster µµ min_elements º¸´Ù ¸â¹ö¼ö°¡ ÀÛÁö ¾Ê±â ¶§¹®¿¡ Æó±âµÇ´Â cluster ´Â ¾ø´Ù.

step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù Å©°Å³ª °°Áö ¾Ê°í, ¶Ç´Â ¹Ýº¹µÇ´Â ¼ö (1) °¡ Ȧ¼ö(odd)À̹ǷÎ, step 8 ·Î °£´Ù.

step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 º¸´Ù Å©±â ¶§¹®¿¡, ¶ÇÇÑ ¹Ýº¹ÀÇ ¼ö°¡ Ȧ¼ö(odd) À̹ǷÎ, ºÐ¸® µ¿ÀÛ (step 9) °¡ ¼öÇàµÈ´Ù.

step 9,  split_size * ¸¦ ÃÊ°úÇÏ´Â (º¯¼ö ¸¦ À§ÇÑ) Ç¥ÁØÆíÂ÷¸¦ °¡Áö´Â cluster {8, 9, 10, 11, 12, 13, 14} °¡ ÀÖ´Ù. sample µéÀº cluster ¿¡¼­ÀÇ ÀÇ Æò±Õº¸´Ù À۰ųª Å« °ªÀ» °¡Áö´Â µÎ ÁýÇÕÀ¸·Î ³ª´¶´Ù.

{8, 9, 10, 11, 12, 13},   {14}

±×µéÀÇ centroid »çÀÌÀÇ °Å¸®°¡ 1.1 * mid_dist º¸´Ù Å©°Å³ª °°±â ¶§¹®¿¡, cluster ´Â ºÐ¸®»óŸ¦ À¯ÁöÇÑ´Ù. À̶§¿¡´Â 3 °³ÀÇ cluster °¡ Á¸ÀçÇÑ´Ù.

{1, 2, 3, 4, 5, 6, 7},    {8, 9, 10, 11, 12, 13},   {14}

step 10, ¹Ýº¹ Ƚ¼ö°¡ ¿ä±¸µÇ´Â ¼öº¸´Ù ÀÛ°í cluster °¡ º¯È­µÇ¾úÀ¸¹Ç·Î step 2 ·Î °£´Ù.

´Ù½Ã À̶§¿¡ ¾Ë°í¸®ÁòÀÇ Forgy part (step 1~4) ´Â cluster¸¦ º¯È­½ÃÅ°Áö ¾Ê´Â´Ù.

step 5, cluster {14} ´Â min_elements ¸â¹ö¼öº¸´Ù À۱⠶§¹®¿¡ Æó±âµÈ´Ù. À̶§¿¡´Â µÎ °³ÀÇ cluster °¡ µÈ´Ù.

{1, 2, 3, 4, 5, 6, 7},    {8, 9, 10, 11, 12, 13}

step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù ÀÛ°í ¹Ýº¹È½¼ö (2) °¡ ¦¼öÀ̱⠶§¹®¿¡, º´ÇÕµ¿ÀÛ (step 7) ÀÌ ¼öÇàµÈ´Ù.

cluster {1, 2, 3, 4, 5, 6, 7} °ú {8, 9, 10, 11, 12, 13} ÀÇ centroid »çÀÌÀÇ °Å¸®°¡ min_dist º¸´Ù À۱⠶§¹®¿¡, ÀÌ cluster µéÀº º´ÇÕµÇÁö ¾Ê´Â´Ù. step 8 ·Î ³ª¾Æ°£´Ù.

step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 º¸´Ù Å©°í ¹Ýº¹ Ƚ¼ö°¡ ¦¼öÀ̱⠶§¹®¿¡ step 10 À¸·Î °£´Ù.

step 10, ¹Ýº¹ Ƚ¼ö°¡ ¿ä±¸µÇ´Â ¼öº¸´Ù ÀÛ°í cluster µéÀÌ º¯È­µÇ¾ú±â ¶§¹®¿¡ step 2 ·Î °£´Ù.

´Ù½Ã À̶§¿¡ ¾Ë°í¸®ÁòÀÇ Forgy part (step 1~4) ´Â cluster¸¦ º¯È­½ÃÅ°Áö ¾Ê´Â´Ù.

step 5, ¾î¶² cluster µµ min_elements º¸´Ù ¸â¹ö¼ö°¡ ÀÛÁö ¾Ê±â ¶§¹®¿¡ Æó±âµÇ´Â cluster ´Â ¾ø´Ù.

step 6, cluster ÀÇ ¼ö°¡ 2 * no_clusters º¸´Ù ÀÛ°í ¹Ýº¹È½¼ö (3) °¡ Ȧ¼öÀ̱⠶§¹®¿¡ step 8 ·Î ³ª¾Æ°£´Ù.

step 8, cluster ÀÇ ¼ö (2) °¡ no_clusters / 2 º¸´Ù Å©°í ¹Ýº¹È½¼ö°¡ Ȧ¼öÀ̱⠶§¹®¿¡, ºÐ¸® µ¿ÀÛ (step 9) ÀÌ ¼öÇàµÈ´Ù.

step 9, ¾î¶² cluster µµ Ç¥ÁØÆíÂ÷°¡ split_size * ¸¦ ÃÊ°úÇÏ´Â º¯¼ö¸¦ °¡Áö°í ÀÖÁö ¾Ê±â ¶§¹®¿¡ step 10 À¸·Î ³ª¾Æ°£´Ù.

step 10, ¹Ýº¹ Ƚ¼ö°¡ ¿ä±¸µÇ´Â ¼öº¸´Ù ÀÛ´Ù, ±×·¯³ª ¾î¶² cluster µµ º¯È­µÇÁö ¾Ê¾Ò´Ù. µû¶ó¼­ ¾Ë°í¸®ÁòÀº Á¾·áµÈ´Ù.